identityProperties: a proposal for a simpler and better-founded replacement for JSON-ref within JSON-schema

102 views
Skip to first unread message

Matthew W

unread,
May 28, 2010, 12:01:05 PM5/28/10
to JSON Schema
Hi again

I've been thinking about this in a bit more depth and come up with a
proposal which should hopefully clarify my thinking:

The aim is to allow JSON schemas to define an equivalence relation on
fragments of JSON tree, which can be used to deserialize a JSON tree
into a graph of JSON objects (which may potentially contain cross-
references and cyclic references).

This is a simpler facility than JSON-ref, since it does not require
reference to URIs or document fragment identifiers. It may however
still be used in combination with the hyper-schema extensions to use
URI-based resolution where this is desired, and a very similar syntax
and approach to json-ref might be used with it if desired (via eg
"identityProperties": ["$ref"] and an accompanying hyper-schema link
relation).

I believe that this proposal would provide a more solid logical
foundation for the metacircular definition of json-schema, and make
json-schema simpler to implement by reducing its dependency of the
more advanced features of json-ref.


Details of what would need adding to the JSON-schema spec:

A schema of type 'object' may specify a list of 'identityProperties',
eg:

{
"type": "object",
"properties": {
"id": {"type": "string"}
},
"identityProperties": ["id"]
}

Where a client is able to resolve [1] one or more JSON objects, each
of which are subject to a common schema [2] which specifies
identityProperties, and where these objects agree on their values for
all of the defined identityProperties:

* These objects MAY be interpreted by clients as having the same
identity

* When deserializing the JSON tree, references to objects with the
same identity MAY be unified as references to a common object instance
(resulting in an object graph, rather than just an object tree)

* Where this type of unification is peformed by a client, clients MUST
use the most fully-defined (with respect to properties present on the
object) of the set of resolvable objects of the same identity, as the
unified common object instance [3].

* Schema validation is defined to use the graph-based semantics of the
deserialized object graph for a schema object, using
identityProperties as described above [2] -- and this semantics MUST
be used by all validators.

Notes:

1. Resolution of objects by identityProperties

For the purposes of unifying objects with the same identityProperties
values, a client MAY resolve matching objects via any suitable
resolution mechanism available for this purpose.

In particular, clients performing any such unification MUST resolve
all objects of the same identity within the context of any particular
JSON tree which is being validated or deserialized.

They may also resolve canonical fully-defined versions of objects via
any available canonical resolution mechanisms for the object schema
and identity scheme in question, including:

* Lookup by identity in a canonical document store or database
* Lookup by fetching a canonical URI for the object identity in
question, where additional hypertext metadata allows such a URI to be
inferred from the identityProperty values. Note that the existing
hyper-schema standard would be ideal for this purpose.

2. Identity of schema objects

Specifying what it means in general for multiple objects to be subject
to "the same" schema, and allowing for cyclic references within schema
objects themselves, requires a concept of identity for schema objects
themselves. This would be specified in a metacircular fashion via
specifying identityProperties on the metaschema for JSON schemas.

3. Inconsistent objects with the same identityProperties

For the described unification to be well-defined, objects to be
unified must agree not just on their identityProperties, but also on
any other properties which they both define. Where such objects
disagree, they termed inconsistent, for example {"id": 1, "foo":
"bar"} and {"id": 1, "foo": "baz"} are inconsistent. The presence of
such inconsistent objects within a single JSON tree MUST necessarily
prevent a consistent graph-based deserialization of that tree from
succeeding, and as such MAY be interpreted as a validation failure for
that JSON tree.

Note that inconsistencies between objects contained in one resolved
JSON tree of a given identity, and objects in a distinct JSON tree
which is intended to replace or update it, may be tolerated and indeed
useful when update semantics are required.

4. Handling of missing identityProperties

identityProperties may be specified as optional in a schema; if one or
more optional identityProperties are missing, the object in question
will not be identified with any other objects; in particular two
objects of the same schema with missing identityProperties MUST NOT be
unified.


A more involved example: a schema for 'foo' objects, which must
contain a reference to another foo object under a property 'foo'.
First the schema (assuming identityProperties: ['id'] is specified in
the metaschema for schemas, although some other property could be
used):

{
"id": "foo-schema",
"type": "object",
"properties": {
"id": {"type": "string"},
"foo": {"id": "foo-schema"},
"bar": {"type": "integer"}
},
"identityProperties": ["id"]
}

and a valid foo object:

{
"id": "foo",
"foo": {"id": "foo"},
"bar": 123
}

If graph-based deserialization was performed on the basis of this
schema, this would result in an object with a cyclic reference to
itself, ie the result of:

foo = {"bar": 123}
foo.foo = foo

Note that although the 'foo' and 'bar' properties are not optional,
{"id": "foo"} is still considered valid despite lacking these
properties: since it is replaced, under the graph-based
deserialization semantics used for validation, by the most fully-
defined resolved instance of this id within the JSON tree, namely the
parent object, which contains the properties in question.

Comments welcome!

-Matt

Kris Zyp

unread,
May 29, 2010, 10:29:04 AM5/29/10
to json-...@googlegroups.com
I am not sure I understand how having two mechanisms for
linking/referencing is simpler, or an improvement over a single
mechanism? And how is referencing an object "foo" easier if it is a
non-URL identity than a (relative) URL identity? Either way the
reference id can be "foo". Is the issue that it would it be easier to
handle the meta-schemas if we used id-based references instead of
fragment identifiers (so one wouldn't need to implement fragment
handlers in order to interpret the meta-schema)? That is perfectly
legitimate alternate way to represent the graph, so I'd be fine with
adjusting that.

Also note that id-based referencing can't be used to reference arrays
(obviously arrays can't declare their identity).

Kris

--
Thanks,
Kris

Matthew W

unread,
May 29, 2010, 12:22:02 PM5/29/10
to JSON Schema
Kris,

Sorry I think I failed slightly in motivating this sufficiently - hope
the following helps.

> I am not sure I understand how having two mechanisms for
> linking/referencing is simpler, or an improvement over a single
> mechanism?

Perhaps a misunderstanding here - I'm proposing this as the only
linking mechanism, to replace json-ref, although noting that in
combination with hyper-schema it's compatible with a useful subset of
json-ref. In a way it's more general, in that it allows users of json-
schema to use any (or no!) resolution mechanisms they wish on top of
identity-based object graph linking.

> And how is referencing an object "foo" easier if it is a
> non-URL identity than a (relative) URL identity? Either way the
> reference id can be "foo".

I guess the advantage lies in decoupling the schema model from
concepts (URIs, relative URIs) and semantics which are specific to
hypertext protocols.

That seemed like a road you were already heading down with the
separation of hypertext-specific metadata into the hyper-schema
extensions. And it's an approach I like, since ultimately json data
can exist and can be described independently of its availability via
any specific hypertext protocols or URIs. It's just data. You
shouldn't have to make up pretend URIs (or pretend relative URIs -
relative to what?) for it unless you're actually making it available
at those URIs. Otherwise hypertext clients might misinterpret it and
try to fetch URIs which don't exist.

hypertext-specific schema extensions could add extra semantics telling
clients how to resolve references by identity, but the existence of
such a resolution mechanism isn't required in order to express the
identity semantics of data within a schema. Another side-effect is
there's somewhat less redundancy. hyper-schema can already tell you
how to construct a URI from a numeric ID, so why the need to include
the URI itself with all references to the object. Wouldn't it be nice
to (say) change the URI templates without needing to update the
references in all your json.

identityProperties also has a nice symmetry to it in that all
instances of an object specify their identity in the same way. This is
quite nice if you want to be able to describe object of a particular
identity at different levels of detail, all the while knowing that in
the resulting object graph they'll be treated as refering to the most
detailed version.

One other (pretty big for me!) advantage of using identity-based
referencing rather than relative fragment identifiers is that it
avoids the need for all json references within a tree to agree on what
the root object of that tree is.

Say I pull the following JSON object out of a cache -- let's say for
the sake of argument it was incredibly expensive to compute:

{"foo": {"$ref": "#"}}

And I want to stuff it into an array and send it back to the client.
Naively I do this:

[{"foo": {"$ref": "#"}}]

But now the semantics have actually changed; my object no longer
refers to itself, but to the new root object. In order to fix the
semantics, I have to go through my cached json fragments and change
all json-refs, prefixing the path to the position of the fragment
within the new root object.

[{"foo": {"$ref": "#.0"}]

Sticking together cached json fragments is something we do quite a lot
server-side; often it's done using simple server-side-include
techniques in servers like nginx, so there's no opportunity to rewrite
references even if you could stomach the extra CPU load over a simple
cache fetch.

> Is the issue that it would it be easier to
> handle the meta-schemas if we used id-based references instead of
> fragment identifiers (so one wouldn't need to implement fragment
> handlers in order to interpret the meta-schema)? That is perfectly
> legitimate alternate way to represent the graph, so I'd be fine with
> adjusting that.

Yep, that's one of the advantages, but the other perhaps bigger goal
was that it fixes and defines the semantics of interpreting the graph-
based structure of data when validating it, which is currently (as far
as I can make out) is broken in the way validation is defined.

At present validation isn't defined in a way which expands (or rather,
joins up into a graph) json references before applying validation
constraints, leading to problems like the one I mentioned at the end
of that other thread, where {"$ref": "/foo/bar"} would fail to
validate in a context where some non-optional property is required,
even if that property is present on the referenced json object, since
it's not defined literally on the json reference object. I think this
may have gone unnoticed when it comes to the metaschemas, which would
validate anyway since all properties on a schema happen to be
optional.

When the validator hits one of these references, you don't want it to
just say, "ok that validates as a json-ref and since the schema
extends json-ref, json-refs are allowed anywhere; i'm done here let's
move on". You want it to validate against what that reference actually
refers to. Maybe it refers to some data which wouldn't actually
validate in that context, for example.

Logically speaking, validation operates on object graphs, and is
defined in terms of object graphs; you need to first explain how to
get an object graph from the serialized tree with references, before
explaining how to interpret that graph for the purpose of validation.

Now that could be achieved via json-ref or via my identityProperties
proposal; I happen to prefer the latter :) but if you stick with json-
ref the issue then remains. After thinking about it more I think
really for correctness' sake you have no option but to specify the
json-ref resolution, tree -> graph transformation as a transformation
which happens before specifying the schema validation process.

> Also note that id-based referencing can't be used to reference arrays
> (obviously arrays can't declare their identity).

That is admittedly a shortcoming of the identityProperties approach as
opposed to the full-blown fragment identifiers. Although for what it's
worth i've never had a need to identify an array in this way; one
workaround would be to nest the array as a property of an object with
an identity.

Note that there are similar problems for using hyper-schema to specify
a 'self' URI for an array, since arrays don't have properties
(although I guess you could treat '0', '1', '2' etc as properties of
an array, in hyper-schema or identityProperties, which would work for
situations where you're using an array as a tuple)

-Matt

Ganesh and Sashi Prasad

unread,
May 29, 2010, 2:41:46 PM5/29/10
to json-...@googlegroups.com
> I guess the advantage lies in decoupling the schema model from
concepts (URIs, relative URIs) and semantics which are specific to
hypertext protocols. [...] since ultimately json data

can exist and can be described independently of its availability via
any specific hypertext protocols or URIs. It's just data. You
shouldn't have to make up pretend URIs (or pretend relative URIs -
relative to what?) for it unless you're actually making it available
at those URIs. Otherwise hypertext clients might misinterpret it and
try to fetch URIs which don't exist.

Matt,

I'm clearly out of my depth here in this larger discussion about identityProperties, but I would like to submit that the conceptual difference between a URL and a URI probably makes practical sense with regard to your argument above. A URI does not have to have anything to do with hypertext or locations. It could be "urn://", for example, not necessarily "http://", and doesn't have to mislead clients into trying to find it at a "location". It's clearly a uniform resource *identifier*, not a uniform resource *locator". There could be a degree of "location" in terms of fixing the position of an element in a tree of related elements, but not as a resource hosted at a certain web address.

I'm not addressing your larger argument here, just the point you made above re. the implied hypertext nature of URIs.

Regards,
Ganesh


--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To post to this group, send email to json-...@googlegroups.com.
To unsubscribe from this group, send email to json-schema...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/json-schema?hl=en.


Matthew W

unread,
May 29, 2010, 3:46:46 PM5/29/10
to JSON Schema
> I'm clearly out of my depth here in this larger discussion about
> identityProperties, but I would like to submit that the conceptual
> difference between a URL and a URI probably makes practical sense with
> regard to your argument above. A URI does not have to have anything to do
> with hypertext or locations. It could be "urn://", for example, not
> necessarily "http://", and doesn't have to mislead clients into trying to
> find it at a "location". It's clearly a uniform resource *identifier*, not a
> uniform resource *locator". There could be a degree of "location" in terms
> of fixing the position of an element in a tree of related elements, but not
> as a resource hosted at a certain web address.

Ah yep, good point. Have to admit I forgot about URNs :)

Although technically a URN does still need to follow a certain format
(in particular starting with a urn-schema:) right? and relative
addressing modes are only defined for some protocols of URLs (like
HTTP) not for general URIs. So in something like {"uri": "foo"}, the
"foo" couldn't be interpreted as a URN since it lacks a schema, and if
interpreted as a relative URL it would only work relative to the URL
of the root document (which assumes that the root document, and the
sub-object, are both necessarily available via some hypertext protocol
supporting relative URLs - an assumption which I'd rather wasn't
mandatory in order to link objects within JSON trees)

Using relative URLs also leads to a similar problem to the one I
described with substitutability of JSON trees. Because their semantics
vary depending on the URL which the parent document is served from,
you lose the properties of easy composability / substitutability of
chunks of JSON tree; you have to be careful to re-write any relative
URLs within json fragments to work in the context of the hypertext URI
they're to be served from. Same problem as when using them in HTML in
fact, but not one I'm keen to replicate when writing web services.

I think really this substitutability problem (with relative URIs and
also with document-root-relative fragment addressing for references)
is the biggest argument in favour of identity-based linking rather
than fragment-path-based or relative-url-based linking.

It really hurts composability of generated chunks of json tree, if you
have to re-write reference paths and relative URIs for every hypertext
resource context and/or json document root within which you want to
use that chunk of json.

Cheers
-Matt

Ganesh and Sashi Prasad

unread,
May 29, 2010, 5:27:49 PM5/29/10
to json-...@googlegroups.com
Matt,

U R right about URIs :-), and perhaps the URI we use needs to imply a newly-defined "json" URI scheme that supports the notion of a hierarchy without implying a physical location like a web address. A new URI scheme can be tailored to what we want it to do without polluting it with concepts from other domains. We should be able to define and solve the document-root-relative problem in an elegant manner. You're right that what we're dealing with is just data, so it shouldn't be tied to hypertext and its associated protocols. But would a new URI scheme be overkill for what we're trying to solve?

Regards,
Ganesh

-Matt

Ganesh and Sashi Prasad

unread,
May 29, 2010, 5:36:23 PM5/29/10
to json-...@googlegroups.com
Oh, and a new "json" (or more generically, "datatree") URI scheme may be able to provide a means of supporting an equivalent to XML namespaces. May be worth thinking about...

Regards,
Ganesh

Matthew W

unread,
May 29, 2010, 5:50:29 PM5/29/10
to JSON Schema
> perhaps the URI we use needs to imply a
> newly-defined "json" URI scheme that supports the notion of a hierarchy
> without implying a physical location like a web address. A new URI scheme
> can be tailored to what we want it to do without polluting it with concepts
> from other domains.
...
> Would a new URI scheme be overkill for what we're trying to solve?

Using a URI (whether URN or URL) as the identityProperty would
certainly work, and might be a common use case. Although I would
prefer not to require that you use a URI for the identityProperties.
I'd prefer to let people use whatever properties they want for
identity (eg an ID integer), and I think hyper-schema already does a
great job of describing how to map those properties on a JSON object
to one or more URIs.

I'm not sure how this proposed 'json' URN would work; as I understand
it URNs are supposed to identify a resource within some publicly-
standardised identifier namespace, and maybe aren't so suitable for
ephemeral identifiers which might exist solely to link together two
objects in a json file exchanged privately.

> We should be able to define and solve the document-root-relative problem in an elegant manner.

I think to solve this problem you would have to use some kind of
identity-based linking of objects; this could be done via assigning
absolute (rather than relative) URIs to all objects and using this to
identity them, or it could be done via the more general mechanism I
proposed, with hyper-schema used to describe any mappings to URIs
which are desired, but no obligation to make up URIs for things unless
a URI is genuinely appropriate for it.

Using 'self-relative' rather than 'root-relative' path identifiers
might be one other attempted solution for the json subtree
substitution problem--so eg a relative identifier that says "to find
the object I'm referring to, go up 2 levels from here, and then down
again along this path". But I think that's a bit of a messy fix to be
honest. And while it fixes the substitutability problem, it still
leaves you with the problem that bits of a json tree aren't in general
usable outside the context of the tree they're found in (since
removing them from the parent tree and trying to re-use them elsewhere
might break references contained within).

-Matt

Matthew W

unread,
May 29, 2010, 6:44:58 PM5/29/10
to JSON Schema
After thinking about this some more, I realised that I can find a
quite nice way to make this proposal fully backwards compatible with
the existing relative addressing modes of json-ref. While I don't like
root-relative path references myself for the reasons explained,
obviously it's useful to some people, so it would be nice to be
backwards compatible.

To achieve that I would make the following additions:

* When a schema specifies a string as having format "uri", and the
string occurs within a JSON tree which is resolved via a URI, the
values of these strings may contain relative URIs, which should be
considered expanded relative to the URI of the parent JSON tree for
the purposes of any comparisons or resolution performed with these
values. In particular they should be expanded prior to comparing
identityProperties.

* A special default of "self" may be specified for properties of type
"string" and format "uri". The meaning of this default value is
defined only when the object is contained within a JSON tree which is
resolved via a URI; the default is a URI consisting of the URI of the
parent JSON tree together with a path-based fragment identifier
identifying the object in question within that JSON tree.

[or maybe defaultToSelf: true rather than using a string "self" with
special semantics for the default property in the case of uris? or
some equivalent mechanism]

So then the following would allow json-references of the current
variety to be used in a given situation:

{
"identityProperties": ["$ref"],
"properties": {
"$ref": {
"type": "string",
"format": "uri",
"optional": true,
"default": "self"
}
}
}

then eg this would work:

GET /file.json
Host: foo.com

{"foo": {"$ref": "#"}}

since this would actually expand to:

{"foo": {"$ref": "http://foo.com/file.json#"}, "$ref": "http://foo.com/
file.json#"}

which is then linked using the general-purpose identityProperties
semantics as previously specified.

I'm hoping this provides the best of both worlds, allowing relative
addressing to be used where desired, but just as a special case of a
more general identity mechanism which doesn't require one to use URIs
or any kind of root-relative mode of reference in order to link
together objects in a JSON tree.

-Matt

Kris Zyp

unread,
Jun 1, 2010, 12:46:23 PM6/1/10
to json-...@googlegroups.com, Matthew W
I think there are basically three core proposals that you are suggesting:

* Merging identity and referencing properties - In json-ref, there are
separate properties for referencing an object and self-identification of
an object. You are suggesting that an identity property be used for
both. The fundamental problem with this approach is that it reduces the
information available for the linker and creates ambiguities that are
more difficult to deal with. For example, if we had
[{id:"foo"},{id:"foo"}], it is not clear if these are both references to
a "foo" object defined elsewhere, or if one of these is the definition
of "foo" (wouldn't matter which one). By using a separate reference
property, referencing is explicit, and the interpreter knows if the
object is really empty object (except for the identity property) or if
it is referencing some other object (that the interpreter may or may not
be able to retrieve.

* Using non-URI values for identity - I am still not sure why relative
URIs are not suitable for the id-based referencing from the examples.
Relative URIs free you from needing to know anything about the parent
context, and you can use identities just as you describe. You can come
up with your own internal URI schema (that would never be exposed), if
you want to resolve relative identifiers to full URIs. As Ganesh pointed
out, URIs include the subset of identifiers that are non-locatable
(URNs) as well. If you want to only resolve objects within the current
document ({id:"foo",me:{$ref:"foo"}}), this is perfectly fine with URIs,
and if you don't want to handle references outside the current document
({id:"foo",me:{$ref:"bar"}}), it is still fine with URIs, you just throw
an error an error indicating that you can't locate the target object
(you have to throw either way, I would assume). Are there situations
where URIs are not adequate for the needs of id-based references?

* Resolving references prior to validation - This is certainly critical
for correct instance validation and is not addressed properly right now.
However, I would think this could be resolved by including language in
the specification that any links that use the "full" relation should be
resolved and substituted prior to validation. Would that be sufficient?

Thanks,
Kris

--
Thanks,
Kris

Matthew Willson

unread,
Jun 1, 2010, 2:05:42 PM6/1/10
to json-...@googlegroups.com, Kris Zyp
> I think there are basically three core proposals that you are suggesting:

Yep - thanks for splitting it up - in retrospect separate emails would have been better

> * Merging identity and referencing properties - In json-ref, there are
> separate properties for referencing an object and self-identification of
> an object. You are suggesting that an identity property be used for
> both. The fundamental problem with this approach is that it reduces the
> information available for the linker and creates ambiguities that are
> more difficult to deal with. For example, if we had
> [{id:"foo"},{id:"foo"}], it is not clear if these are both references to
> a "foo" object defined elsewhere, or if one of these is the definition
> of "foo" (wouldn't matter which one). By using a separate reference
> property, referencing is explicit, and the interpreter knows if the
> object is really empty object (except for the identity property) or if
> it is referencing some other object (that the interpreter may or may not
> be able to retrieve.

The idea is that the schema, when talking at the level of pure JSON data, just dictates what it means for two instances of this schema to be considered equal. In order to describe how to serialize and deserialize bits of graph-based data in a way that doesn't require a root-relative addressing mechanism (see further on for the problem with this).

It would be up to the application to determine how much resolution it attempts to do when looking for instances to equate in the object graph, but it need not limit itself to those present in the document itself. In particular, many applications might want to use hyper-schema to find out exactly what resolution-by-id mechanisms are available for instances of that schema, and then use these in all cases to find the canonical full version of the object to use when joining up the graph. Or, for example, server-side apps might want to look up {id: 1234} in the database, and then switch the canonical full version into the graph before validating, to save clients from having to supply all the non-optional properties for some schema in order to communicate which instance they're talking about when the server already maintains the canonical database for objects of this schema.

That's in keeping with a desire to separate the description of data (in particular the description of data which comes in a graph not just a tree), which is quite a pure and beautiful domain - and metadata relating to hypertext mechanisms that might be used to resolve that data. Which it seemed you already keen on based on this json-schema / hyper-schema separation.

That said I realise that this makes the semantics for resolving links a little less obvious / more application-defined, but I think hyper-schema is the place to define the precise application-specific object resolution semantics (since hypertext is exactly what this is!).

So, I think it would be wonderful if there were two layers:

* Some general core concepts which allow json-schema to describe the serialization and deserialization of graph-based data based on identity
* Extra hypertext metadata which pins down exactly how resolution is performed and allows precise hypertext-reference-style semantics where desired

If you think my suggestion doesn't do enough to achieve the latter I'm happy to work on it

> * Using non-URI values for identity - I am still not sure why relative
> URIs are not suitable for the id-based referencing from the examples.
> Relative URIs free you from needing to know anything about the parent
> context, and you can use identities just as you describe. You can come
> up with your own internal URI schema (that would never be exposed)

I think there's some confusion here between relative URIs like in your example ("foo") and those with a fragment identifier ("#foo")

The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, …), and very much depends on the parent context.

To explain the problem with them by an analogy, consider why using this kind of relative URL in HTML websites is such a pain in the bum - if you ever need to move some of the HTML files around in the directory structure, it can break the relative URLs (unless all the referenced files move alongside it); and perhaps more critically, if you want to cache some HTML snippet for use at multiple different URLs, you're out of luck because the relative URIs contained within it will mean different things in the different parent URI contexts.

So, I'm guessing you actually meant to suggest eg "#foo":

[{uri: "#foo"}, {uri: "#foo"}]

Which does works a bit better, because the meaning of these links when interpreted as relative URIs, no longer varies depending on the URL of the tree they're contained within.

> Are there situations
> where URIs are not adequate for the needs of id-based references?


BUT, and this is a big but, their meaning still depends on the context of what is considered the root of the tree they're to be used in. This makes it impossible to cache some fragment of JSON for use in different positions in different parent trees.

That's a big problem for me, and I imagine for others, because we use caching of JSON fragments heavily to help scale our web services.

Here's the simplest example I can think of to illustrate the problem (admittedly rather trivial, but I can construct a more real-world example too if you don't mind it being more verbose)

Say I want an object foo with a link to itself. I want a snippet of JSON for this, which I can re-use in whatever context I like, and I want it to mean the same thing (an object with a link to itself) wherever I use it. So I write:

{"foo": {"$ref": "#"}}

Using a relative URI to refer to myself. This works because I know that (in this instance at least!) I am the root of the tree.

But now, let's say I want to re-use this JSON snippet within another tree:

{"some": {"other": {"tree": {"foo": {"$ref": "#"}}}}}

My object no longer refers to itself, but to the new root object. It means something different; to fix it I would have to re-write the snippet to:

{"some": {"other": {"tree": {"foo": {"$ref": "#some.other.tree"}}}}}

But the precise reason I'm caching these snippets is so I don't have to fiddle around re-generating them for every context I want to use them in :) So this is a bit of a deal-breaker for me when it comes to using "path-from-root" linking rather than identity-based linking.

> * Resolving references prior to validation - This is certainly critical
> for correct instance validation and is not addressed properly right now.
> However, I would think this could be resolved by including language in
> the specification that any links that use the "full" relation should be
> resolved and substituted prior to validation. Would that be sufficient?

Something along those lines should work, yep, although then it no longer really makes sense to have your metaschema extend a schema for json-ref, because the json-refs are expanded before the meta-schema validates itself; the raw reference objects themselves are never validated, at least not during the main validation pass, since they need to be expanded beforehand for the validation process to be correctly defined.

That's kinda what I meant about having to link up the references as a first pass for the meta-schema and the json-schema spec to make full logical sense.

-Matt

Matthew W

unread,
Jun 2, 2010, 9:52:49 AM6/2/10
to JSON Schema, Kris Zyp
As another alternative when it comes to referencing:

Have you seen the 'JSPON' spec?

This appears to have a certain amount in common with JSON-ref, but
also allows a form of identity-based referencing as well as path-from-
root-based and URI-based references.

http://www.jspon.org/JSPON_Core_Spec.htm

So you can do eg:

{"id": "foo", "self": {"$ref": "foo"}}

rather like <input id="foo" /> <label for="foo">...</label> in HTML,
which by the way is another identity-based linking scheme.

Although it's not quite as general as identityProperties, I'd
certainly find it easier to live with this than with JSON-ref.


(As perhaps another illustration of the downside of relative-to-root-
based referencing, imagine if in HTML you had to do:

<input ... /> <label for="html/body/div[0]/div[0]/form/input">

and then you needed to reorganise the DOM or re-use that form in
another HTML page :-)

-Matt

Kris Zyp

unread,
Jun 3, 2010, 8:39:27 AM6/3/10
to Matthew Willson, json-...@googlegroups.com

> The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, �), and very much depends on the parent context.
>

I don't see anything in http://www.ietf.org/rfc/rfc3986.txt that forbids
the use of relative URIs in the context of URIs that do not have a
location mechanism. It seems perfectly valid to use relative URIs even
if the identity and references are completely contained in the document.


> To explain the problem with them by an analogy, consider why using this kind of relative URL in HTML websites is such a pain in the bum - if you ever need to move some of the HTML files around in the directory structure, it can break the relative URLs (unless all the referenced files move alongside it); and perhaps more critically, if you want to cache some HTML snippet for use at multiple different URLs, you're out of luck because the relative URIs contained within it will mean different things in the different parent URI contexts.
>
> So, I'm guessing you actually meant to suggest eg "#foo":
>
> [{uri: "#foo"}, {uri: "#foo"}]
>
> Which does works a bit better, because the meaning of these links when interpreted as relative URIs, no longer varies depending on the URL of the tree they're contained within.
>

No, I meant "foo", not "#foo". "foo" is a identity based reference,
"#foo" is a root-relative JSON path reference.


>
>> Are there situations
>> where URIs are not adequate for the needs of id-based references?
>>
>
> BUT, and this is a big but, their meaning still depends on the context of what is considered the root of the tree they're to be used in. This makes it impossible to cache some fragment of JSON for use in different positions in different parent trees.
>
> That's a big problem for me, and I imagine for others, because we use caching of JSON fragments heavily to help scale our web services.
>
> Here's the simplest example I can think of to illustrate the problem (admittedly rather trivial, but I can construct a more real-world example too if you don't mind it being more verbose)
>
> Say I want an object foo with a link to itself. I want a snippet of JSON for this, which I can re-use in whatever context I like, and I want it to mean the same thing (an object with a link to itself) wherever I use it. So I write:
>
> {"foo": {"$ref": "#"}}
>
> Using a relative URI to refer to myself. This works because I know that (in this instance at least!) I am the root of the tree.
>
> But now, let's say I want to re-use this JSON snippet within another tree:
>
> {"some": {"other": {"tree": {"foo": {"$ref": "#"}}}}}
>
> My object no longer refers to itself, but to the new root object. It means something different; to fix it I would have to re-write the snippet to:
>
> {"some": {"other": {"tree": {"foo": {"$ref": "#some.other.tree"}}}}}
>
> But the precise reason I'm caching these snippets is so I don't have to fiddle around re-generating them for every context I want to use them in :) So this is a bit of a deal-breaker for me when it comes to using "path-from-root" linking rather than identity-based linking.
>
>

Yes, I agree, that is why I was not suggesting fragment identifiers.

>> * Resolving references prior to validation - This is certainly critical
>> for correct instance validation and is not addressed properly right now.
>> However, I would think this could be resolved by including language in
>> the specification that any links that use the "full" relation should be
>> resolved and substituted prior to validation. Would that be sufficient?
>>
> Something along those lines should work, yep, although then it no longer really makes sense to have your metaschema extend a schema for json-ref, because the json-refs are expanded before the meta-schema validates itself; the raw reference objects themselves are never validated, at least not during the main validation pass, since they need to be expanded beforehand for the validation process to be correctly defined.
>
> That's kinda what I meant about having to link up the references as a first pass for the meta-schema and the json-schema spec to make full logical sense.
>

Not sure I exactly follow, are you are saying that JSON Schema can't
properly be interpreted to rely json-ref based on metaschemas because
one would need to understand json-ref in order to interpret the json-ref
schema? I agree, although that simply seems like it would necessitate
some bootstrapping information about how to do the initial reference
resolution/substition (before reaching json-ref, which would simply
"agree" with the current bootstrap resolution mechanism). Or do you mean
something else?

--
Thanks,
Kris

Matthew Willson

unread,
Jun 3, 2010, 11:33:35 AM6/3/10
to json-...@googlegroups.com, Kris Zyp
>> The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, …), and very much depends on the parent context.

> No, I meant "foo", not "#foo". "foo" is a identity based reference,
> "#foo" is a root-relative JSON path reference.

OK, well that's great news in that the intention is to support identity-based referencing after all. So I misunderstood somewhat and we've been talking past eachother on this. But there are some problems to be resolved with the syntax for it to work in terms of relative URIs in the URI spec, which contributed to this misunderstanding.

I understood 'foo' as a relative URL to mean "a file called foo in the same directory as this file" (which is how it's interpreted for example in HTML). Relative URI references do appear to be defined in the URI spec in this sense, and only for hierarchical URI schemes:

A relative reference (Section 4.2) refers to a resource by describing
the difference within a hierarchical name space between the reference
context and the target URI. The reference resolution algorithm,
presented in Section 5, defines how such a reference is transformed
to the target URI. As relative references can only be used within
the context of a hierarchical URI, designers of new URI schemes
should use a syntax consistent with the generic syntax's hierarchical
components unless there are compelling reasons to forbid relative
referencing within that scheme.

Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.

As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).

So yeah the confusion arises from the fact that (as I understand it, but I may again misunderstand) you effectively want to support two different forms of fragment-based identifier: an identity-based fragment identifier (like the way HTML defines fragment identifiers - the #section-1 example above) and a path-based identifier (like #path.from.root.to.object)

So you could fix it by, for example:
* Using the '#' in both cases, but add some syntax to the fragment identifier after the '#' to distinguish these two cases, eg #id:foo vs #path:the.path.to.foo
* Adding something explicitly to the spec for json-ref to explain when a ref is to be treated as an identity-based fragment identifier, and when it's to be treated as a path-based fragment identitier, rather than just calling it a URI and relying on the semantics of the URI spec to communicate the distinction
* ...

Anyway yeah I think precisely specifying these semantics for JSON-ref, and making sure they're compatible with the URI spec where appropriate, would avoid any future misunderstandings :)

> Not sure I exactly follow, are you are saying that JSON Schema can't
> properly be interpreted to rely json-ref based on metaschemas because
> one would need to understand json-ref in order to interpret the json-ref
> schema? I agree, although that simply seems like it would necessitate
> some bootstrapping information about how to do the initial reference
> resolution/substition (before reaching json-ref, which would simply
> "agree" with the current bootstrap resolution mechanism). Or do you mean
> something else?

Yep that's pretty much what I mean. And yeah it should be relatively easy to fix by defining a bootstrapping phase to get you from json-ref to an object graph, before going on to define the rest of the json-schema semantics in terms of those object graphs. Personally I think it wouldn't hurt to specify json-ref as a distinct underlying standard, cos some people might wanna use it without the schema stuff, but i'm not overly bothered either way.

Note that both the schema object itself, and the object which is being validating, need JSON-refs expanding prior to validation; you might also need to specify exactly how validation is defined in the case of cyclic references / a general object graph, in the schema and in the object being validated. Eg once a node of the graph has been visited once by the validator, it's marked as valid and doesn't need to be visited again. Maybe that goes without saying, but it wouldn't hurt to specify precisely, in case it reveals any awkward corner-cases with extending the spec for validation from trees to graphs.

About the role of the json-ref schema: you could still usefully have a schema file for json-ref, and refer to it in the spec to help describe the pre-conditions for the bootstrapping phase. But that description would be in a circular fashion, given that the json-ref bootstrapping needs to be defined before you can define how to interpret the json-ref schema. So I think a plain english description would be needed too.

> (before reaching json-ref, which would simply "agree" with the current bootstrap resolution mechanism)

I don't think schemas (whose validation semantics is defined on the object graph not on the tree-with-reference-object-stubs) could meaningfully extend that json-ref schema via the semantics of the normal extension mechanism; validation against the parent json-ref schema would happen on the object graph which would no longer contain the references, so it wouldn't be meaningful.

Does that make any sense?

-Matt

Kris Zyp

unread,
Jun 3, 2010, 12:02:17 PM6/3/10
to json-...@googlegroups.com, Matthew Willson

On 6/3/2010 9:33 AM, Matthew Willson wrote:
>>> The former kind of relative URI, "foo", means approximately "a file called foo in the same directory as this file", is only really defined for URL protocols which have filesystem-like path components (HTTP, FTP, �), and very much depends on the parent context.


>>>
>
>> No, I meant "foo", not "#foo". "foo" is a identity based reference,
>> "#foo" is a root-relative JSON path reference.
>>
> OK, well that's great news in that the intention is to support identity-based referencing after all. So I misunderstood somewhat and we've been talking past eachother on this. But there are some problems to be resolved with the syntax for it to work in terms of relative URIs in the URI spec, which contributed to this misunderstanding.
>
> I understood 'foo' as a relative URL to mean "a file called foo in the same directory as this file" (which is how it's interpreted for example in HTML). Relative URI references do appear to be defined in the URI spec in this sense, and only for hierarchical URI schemes:
>
> A relative reference (Section 4.2) refers to a resource by describing
> the difference within a hierarchical name space between the reference
> context and the target URI. The reference resolution algorithm,
> presented in Section 5, defines how such a reference is transformed
> to the target URI. As relative references can only be used within
> the context of a hierarchical URI, designers of new URI schemes
> should use a syntax consistent with the generic syntax's hierarchical
> components unless there are compelling reasons to forbid relative
> referencing within that scheme.
>
>

Just to be clear, this doesn't specify anywhere that the sibling with
the hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).


> Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.
>
> As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).
>

No, the identifier based referencing is not intended to be a fragment
style link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref) is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:

{
"something": {
"id": "foo",
"me", {"$ref": "foo"}
}
}

We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".
This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.

Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.

> So yeah the confusion arises from the fact that (as I understand it, but I may again misunderstand) you effectively want to support two different forms of fragment-based identifier: an identity-based fragment identifier (like the way HTML defines fragment identifiers - the #section-1 example above) and a path-based identifier (like #path.from.root.to.object)
>
> So you could fix it by, for example:
> * Using the '#' in both cases, but add some syntax to the fragment identifier after the '#' to distinguish these two cases, eg #id:foo vs #path:the.path.to.foo
> * Adding something explicitly to the spec for json-ref to explain when a ref is to be treated as an identity-based fragment identifier, and when it's to be treated as a path-based fragment identitier, rather than just calling it a URI and relying on the semantics of the URI spec to communicate the distinction
> * ...
>
> Anyway yeah I think precisely specifying these semantics for JSON-ref, and making sure they're compatible with the URI spec where appropriate, would avoid any future misunderstandings :)
>
>
>> Not sure I exactly follow, are you are saying that JSON Schema can't
>> properly be interpreted to rely json-ref based on metaschemas because
>> one would need to understand json-ref in order to interpret the json-ref
>> schema? I agree, although that simply seems like it would necessitate
>> some bootstrapping information about how to do the initial reference
>> resolution/substition (before reaching json-ref, which would simply
>> "agree" with the current bootstrap resolution mechanism). Or do you mean
>> something else?
>>
> Yep that's pretty much what I mean. And yeah it should be relatively easy to fix by defining a bootstrapping phase to get you from json-ref to an object graph, before going on to define the rest of the json-schema semantics in terms of those object graphs. Personally I think it wouldn't hurt to specify json-ref as a distinct underlying standard, cos some people might wanna use it without the schema stuff, but i'm not overly bothered either way.
>
> Note that both the schema object itself, and the object which is being validating, need JSON-refs expanding prior to validation; you might also need to specify exactly how validation is defined in the case of cyclic references / a general object graph, in the schema and in the object being validated. Eg once a node of the graph has been visited once by the validator, it's marked as valid and doesn't need to be visited again. Maybe that goes without saying, but it wouldn't hurt to specify precisely, in case it reveals any awkward corner-cases with extending the spec for validation from trees to graphs.
>
> About the role of the json-ref schema: you could still usefully have a schema file for json-ref, and refer to it in the spec to help describe the pre-conditions for the bootstrapping phase. But that description would be in a circular fashion, given that the json-ref bootstrapping needs to be defined before you can define how to interpret the json-ref schema. So I think a plain english description would be needed too.
>
>
>> (before reaching json-ref, which would simply "agree" with the current bootstrap resolution mechanism)
>>
> I don't think schemas (whose validation semantics is defined on the object graph not on the tree-with-reference-object-stubs) could meaningfully extend that json-ref schema via the semantics of the normal extension mechanism; validation against the parent json-ref schema would happen on the object graph which would no longer contain the references, so it wouldn't be meaningful.
>
> Does that make any sense?
>

I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.

--
Thanks,
Kris

Matthew Willson

unread,
Jun 3, 2010, 12:34:22 PM6/3/10
to json-...@googlegroups.com
Just to be clear, this doesn't specify anywhere that the sibling with
the hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).

Yep - I was using file as a quicker (but technically incorrect) way of saying 'resource exposing a JSON representation'

Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.

As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).


No, the identifier based referencing is not intended to be a fragment
style link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref)

Just to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?

Is this specified anywhere?

This sounds like it might be heading in the direction I originally wanted to go, where rather than treating the 'id' itself as a URI, hyper-schema is used to translate it into a URI… but yeah I'm a bit confused now.

is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:

{
  "something": {
    "id": "foo",
    "me", {"$ref": "foo"}
  }
}

We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".

Relative to what though?

Relative URLs are typically interpreted as being relative to the URL of the resource whose representation they are contained within. Certainly unless explicitly specified otherwise that is what I would expect. Again I'd give HTML as the canonical example of a hypertext media type containing relative URIs, where <a href="foo"> links to a resource foo within the same part of the hierarchical namespace of the containing resource, and not to "some-arbitrary-urn-scheme://foo"

Sorry I know I seem like I'm nitpicking here, but this lead me to a lot of confusion about json references and I'm still not convinced it's compatible with the URI spec.

If you want "foo", when parsed as a URI, to make sense as a relative URI which is not relative to the URI of the parent resource within a hierarchical URI scheme, but instead relative to some implicit URN scheme (like whatever:// in your example) that would need to be explicitly specified somewhere (either in the spec itself or perhaps via hyper-schema)

In particular some people might actually want to use a genuine hierarchally-relative URI, eg {"$ref": "1234"} to refer to /articles/123 from /articles/index. As a client, how am I to decide what to consider these URIs relative to? seems you'd need something analagous to HTML's <base href="…"> tag to do this.

I'm also not convinced the URI spec allows for relative URIs relative to an arbitrary URN scheme unless that scheme is hierarchical ("relative references can only be used within the context of a hierarchical URI") and there is some base URI in that scheme which the relative URI can be considered as relative to.


This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.

Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.

I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.

Hm… I still maintain that schemas in general extending a json-ref schema doesn't make logical sense for the reasons I tried to explain. The relationship isn't "foo extends json-ref", it's "foo acts on already-processed object graphs obtained from trees which validate against json-ref", and the former just doesn't work as a stand-in for the latter, other than as a rather confusing bit of inbuilt documentation.

But I guess I'll just choose not to extend json-ref from my schemas :)

-Matt

Kris Zyp

unread,
Jun 3, 2010, 3:04:38 PM6/3/10
to json-...@googlegroups.com, Matthew Willson


On 6/3/2010 10:34 AM, Matthew Willson wrote:
Just to be clear, this doesn't specify anywhere that the sibling with
the hierarchical URI needs to be file. URIs are for specifying
"resources" which is a more general concept (a file is just one type of
thing that can be a resource).

Yep - I was using file as a quicker (but technically incorrect) way of saying 'resource exposing a JSON representation'

Whereas it appears you're using it to mean an identity-based fragment identifier within the same document, analagous to <a href="#section-1"> in a HTML page to jump to <h2 id="section-1">.

As I read the URI spec, you'd need to identify this as a fragment identifier via a preceding '#' for it to parse as a relative URI with the semantics you want (resolving a fragment within the current document, rather than resolving to a separate file which is referred to relative to the current file's namespace in a hierarchical URI scheme).


No, the identifier based referencing is not intended to be a fragment
style link. Rather the identifier (the property with the relation of
"self", which is the "id" property in json-ref)

Just to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?

Yes
Is this specified anywhere?

This needs to be defined/specified/added to the spec.


This sounds like it might be heading in the direction I originally wanted to go, where rather than treating the 'id' itself as a URI, hyper-schema is used to translate it into a URI� but yeah I'm a bit confused now.

is defining the
resource, thus indicating it can be referenced as a resource on its own,
even though it is embedded in the representation of another resource.
Therefore when we have an instance of a json-ref schema:

{
��"something": {
����"id": "foo",
����"me", {"$ref": "foo"}
��}

}

We are saying that the inner object (the property value of something) is
a resource on its own, that is identified by the relative URI "foo".

Relative to what though?

Relative URLs are typically interpreted as being relative to the URL of the resource whose representation they are contained within. Certainly unless explicitly specified otherwise that is what I would expect. Again I'd give HTML as the canonical example of a hypertext media type containing relative URIs, where <a href="foo"> links to a resource foo within the same part of the hierarchical namespace of the containing resource, and not to "some-arbitrary-urn-scheme://foo"

Sorry I know I seem like I'm nitpicking here, but this lead me to a lot of confusion about json references and I'm still not convinced it's compatible with the URI spec.

If you want "foo", when parsed as a URI, to make sense as a relative URI which is not relative to the URI of the parent resource within a hierarchical URI scheme, but instead relative to some implicit URN scheme (like whatever:// in your example) that would need to be explicitly specified somewhere (either in the spec itself or perhaps via hyper-schema)

The relative URIs definitely should be relative the URI of the parent resource. When I suggested whatever://, that meant both the parent resource and the "foo" resource are within that scheme. And that is the point of the use of relative URIs, it doesn't actually even matter what the parent resource URI is if all the URIs are relative. Doing {"id": "foo","me", {"$ref": "foo"}} means the same thing regardless of what the parent URI is (HTTP, FTP, or whatever). If the parent URI is http://site.com/parent than the identity and reference resolve to http://site.com/foo and a circular reference is created. If the parent URI is whatever://something/parent than the identity and reference resolve to whatever://something/parent and a circular reference is created. You can correctly evaluate the references without any knowledge of the parent URI (it doesn't even really need to exist), relative URIs free you from contextual dependence.



In particular some people might actually want to use a genuine hierarchally-relative URI, eg {"$ref": "1234"} to refer to /articles/123 from /articles/index.

And in the context of HTTP, this works perfectly.
As a client, how am I to decide what to consider these URIs relative to? seems you'd need something analagous to HTML's <base href="�"> tag to do this.
No, it can be relative to the parent document, just like HTML.


I'm also not convinced the URI spec allows for relative URIs relative to an arbitrary URN scheme unless that scheme is hierarchical ("relative references can only be used within�the context of a hierarchical URI")�and there is some base URI in that scheme which the relative URI can be considered as relative to.

But all the commonly used URIs are hierarchical, and if you were using your own scheme for a set of non-locatable ids, there is no reason to shoot yourself in the foot and call it non-hierarchical. In practice, non-hierarchical is the exception, and really doesn't seem to even warrant much attention.

This could have a resolution to a full URI of whatever://foo, but if it
can be resolved internally, then the full URI doesn't actually need to
be made visible or really computed. The reference is then linking to the
resource "foo", which has already been defined. The resource "foo" is
indeed embedded in the resource representation above (and thus could be
referenced by fragments), but fragments are not necessary since it is
also a separate resource itself. The "foo" resource is distinct from the
resource above (which hasn't been identified, and doesn't need to be)
even though it was embedded in the view.

Consequently the above block is a valid use of URIs and I believe
provides the type of identifier-based referencing that you are after.

I agree that we can't just rely on meta-schemas extending json-ref to
give us json-ref. It has to be articulated with English such that the
boostrapping is sane. The extension of json-ref just creates a logical
consistency to agree with the pre-defined referencing strategy. But
having a referencing mechanism that can be described with English and
later described consistently with JSON Schema seems desirable.

Hm� I still maintain that schemas in general extending a json-ref schema doesn't make logical sense for the reasons I tried to explain. The relationship isn't "foo extends json-ref", it's "foo acts on already-processed object graphs obtained from trees which validate against json-ref", and the former just doesn't work as a stand-in for the latter, other than as a rather confusing bit of inbuilt documentation.

No, the important relationship isn't foo extends json-ref, it is foo is an instance of json-ref, so the "already-processed object graphs" are using the referencing rules defined by json-ref.


But I guess I'll just choose not to extend json-ref from my schemas :)


Yeah, I certainly expect people to extend json-ref, but all schemas are defined to be an instance of json-ref.

-- 
Thanks,
Kris

Matthew Willson

unread,
Jun 3, 2010, 10:19:41 PM6/3/10
to json-...@googlegroups.com
Just to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?

Yes
Is this specified anywhere?

This needs to be defined/specified/added to the spec.

OK, will be interested to see this spec defined precisely.

Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.

Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?

Which is the real URI, "http://example.com/given-by-namespace-of-parent-resource/foo" or "uri://given-by-hyper-schema/foo"? Which should be used for resolution of references? why two different URIs?

The relative URIs definitely should be relative the URI of the parent resource. When I suggested whatever://, that meant both the parent resource and the "foo" resource are within that scheme. And that is the point of the use of relative URIs, it doesn't actually even matter what the parent resource URI is if all the URIs are relative. Doing {"id": "foo","me", {"$ref": "foo"}} means the same thing regardless of what the parent URI is (HTTP, FTP, or whatever). If the parent URI is http://site.com/parent than the identity and reference resolve to http://site.com/foo and a circular reference is created. If the parent URI is whatever://something/parent than the identity and reference resolve to whatever://something/parent and a circular reference is created. You can correctly evaluate the references without any knowledge of the parent URI (it doesn't even really need to exist), relative URIs free you from contextual dependence.

OK so given that I see a reference to a relative URI 'foo', I should:

* First attempt to find an object in the current JSON tree defined with id: "foo", treating both 'foo's as being relative URIs relative to some undefined base URI of some undefined URI scheme and hence resolving both objects as the same instance having the same URI

* If this object exists then whatever representation might actually be served from http://foo.com/namespace/of/parent/resource/foo is irrelevant; even though the relative URI, taken relative to that of its parent, resolves to this URI I should make no attempt to fetch this URI, even though some resource might exist at this URI with a different, conflicting representation

If this is what is desired it would need specifying precisely; it's not what I'd expect given the URI spec, and not the semantics that I would personally want, since it forces me to pretend that some object within my json tree is necessarily accessible via such a URI when expanded relative to my URI, even if it's not accessible in such a fashion, and even if some other conflicting object might be accessible at such a URI.

I'd prefer to use hyper-schema to define the 'self' URI for identity comparison purposes

As a client, how am I to decide what to consider these URIs relative to? seems you'd need something analagous to HTML's <base href="…"> tag to do this.
No, it can be relative to the parent document, just like HTML.

OK, although note that HTML uses <a href="#id-value"> for this, not <a href="id-value"> which would cause an attempted fetch of http://example.com/parent-namespace/id-value regardless of whether an element exists in the current DOM tree with id="id-value"

No, the important relationship isn't foo extends json-ref, it is foo is an instance of json-ref, so the "already-processed object graphs" are using the referencing rules defined by json-ref.

In my example 'foo' was a schema, not an instance. A schema doesn't extend json-ref, since a schema doesn't apply to objects which still have raw json-ref references within them, it applies to an object graph.

-Matt

Kris Zyp

unread,
Jun 4, 2010, 12:21:47 AM6/4/10
to json-...@googlegroups.com, Matthew Willson


On 6/3/2010 8:19 PM, Matthew Willson wrote:
Just to be clear: so the semantics of json-ref depend on the 'self' concept from hyper-schema?

Yes
Is this specified anywhere?

This needs to be defined/specified/added to the spec.

OK, will be interested to see this spec defined precisely.

Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.

Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when�hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?

No, it should be based on the hyper-schema's target URI, which can be relative or absolute (as in this example). If the target is relative, it resolves in context to the parent resource, and if it is absolute, no context is needed.

Which is the real URI, "http://example.com/given-by-namespace-of-parent-resource/foo" or�"uri://given-by-hyper-schema/foo"?�Which should be used for resolution of references? why two different URIs?

The relative URIs definitely should be relative the URI of the parent resource. When I suggested whatever://, that meant both the parent resource and the "foo" resource are within that scheme. And that is the point of the use of relative URIs, it doesn't actually even matter what the parent resource URI is if all the URIs are relative. Doing {"id": "foo","me", {"$ref": "foo"}} means the same thing regardless of what the parent URI is (HTTP, FTP, or whatever). If the parent URI is http://site.com/parent than the identity and reference resolve to http://site.com/foo and a circular reference is created. If the parent URI is whatever://something/parent than the identity and reference resolve to whatever://something/parent and a circular reference is created. You can correctly evaluate the references without any knowledge of the parent URI (it doesn't even really need to exist), relative URIs free you from contextual dependence.

OK so given that I see a reference to a relative URI 'foo', I should:

* First attempt to find an object�in the current JSON tree defined with id: "foo", treating both 'foo's as being relative URIs relative to some undefined base URI of some undefined URI scheme and hence resolving both objects as the same instance having the same URI

* If this object exists then whatever representation might actually be served from http://foo.com/namespace/of/parent/resource/foo is irrelevant; even though the relative URI, taken relative to that of its parent, resolves to this URI I should make no attempt to fetch this URI, even though�some resource might exist at this URI with a different, conflicting representation

The resource should also exist at the URI, if it is a URL (locatable), it would obviously be undesirable for the target to be in conflict or non-existent, as the server would providing inconsistent data. If you are dealing with data where one resource can't be trusted to assign a definition for another resource, than the resource should be fetched (this is discussed in the security section of the json schema spec).

Would you like to see the addition of a definition indicating properties that can be treated like anchor ids that can be referenced through fragment identifiers? I have been making sure that it is clear that that is not how rel="self" properties work, and it is slightly different than your pure identity proposal, but maybe that would be a good approach for preserving a pure URI referencing scheme while allow for intra-document id-based referencing. If we did that, I think I would prefer to have an "anchor" attribute rather than an "identityProperties". For example, schema:

{
� properties: {
��� id: {anchor: true},
��� $ref: {rel:"full", href:"{$ref}"}
� }
}

instance:
{

�� id:"foo",
�� me:{$ref:"#foo"}
}

-- 
Thanks,
Kris

Matthew Willson

unread,
Jun 4, 2010, 7:21:56 AM6/4/10
to json-...@googlegroups.com, Kris Zyp
OK, will be interested to see this spec defined precisely.

Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.

Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?

No, it should be based on the hyper-schema's target URI, which can be relative or absolute (as in this example). If the target is relative, it resolves in context to the parent resource, and if it is absolute, no context is needed.

Aha, great.

Still not sure I've understood it 100% though. So the requirement isn't for the value of that id property to be interpreted as a URI itself (relative or otherwise) - instead the expanded href of the self relation defined by hyper-schema for that object is what determines the URI which an object is defined as having, and this URI can be relative or not depending on your wishes?

(realised I used the wrong syntax for my example, I meant: {"rel": "self", "href": "uri://given-by-hyper-schema/{id}"}  )

That would imply in particular that:

* There's nothing special about the 'id' property - it just happens to be one of the variables in the URI template href of the 'self' link relation

* Different objects might have different hyper-schemas applying to them with different 'self' link relations, meaning that two objects with the same 'id' value are not necessarily equal - eg:

if a hyperschema with {"rel": "self", "href": "some://uri//{id}"} applies to one object with an id: "foo" property
and {"rel": "self", "href": "some://other/uri//{id}"} applies to another object, also with "id": "foo"

these are distinct objects with distinct URIs, neither of which is 'foo' taken as a relative URI relative to the URI of the parent tree.

If there's then a reference to {"$ref": "foo"}, this might not necessarily resolve to either of them; it would expand as a relative URI to http://the.parent/resource/namespace/foo and it would match any object which is an instance of a hyper-schema specifying a 'self' link relation whose target href expands (using the properties of that object) to that URI. It would look for that object first within the current JSON tree, and if that fails it would then attempt to fetch the URI in question in order to obtain the object.


only problem that occurs to me with this approach is that while objects themselves can allow hyper-schema to expand their full URIs, "$ref" references to those objects can't and have to be based on the full expanded URIs.

That introduces a bit of a coupling / dependency, requiring knowledge of the (possibly arbitrary) URI template for the self relation in order to create a correct $ref object. And means there isn't symmetry between the way identity is expressed in referenced and referencing objects.

So, it would be nice if there was also do something like

{"$ref": true, "id": "foo"}

which would then expand the 'self' relation of the hyper-schema applying it to determine the actual URI for the $ref. 

That then gives a nice symmetry (where desired) between the way identity is expressed for both referenced and referencing objects, and is quite close to what I was aiming for with identityProperties



The resource should also exist at the URI, if it is a URL (locatable), it would obviously be undesirable for the target to be in conflict or non-existent, as the server would providing inconsistent data. If you are dealing with data where one resource can't be trusted to assign a definition for another resource, than the resource should be fetched (this is discussed in the security section of the json schema spec).

Hm ok. If (as I think you're saying above) the 'self' relation of the hyper-schema is what actually determines the URI, and hence I can choose to use an absolute URI (a URN even) for that, then this isn't a problem for me.

If not though, if having "foo" in {"id": "foo"} interpreted as a relative URI was my only choice, then there would be a problem with "The resource should also exist at the URI, if it is a URL".

In that case:
* What would I do if I wanted to use 'foo' just to express identity local to that JSON tree (which happens to be served from a HTTP URL say) and I didn't want to make a separate subresource available for 'foo' over http at the relative URI
* Even if I am happy to make these subresources available at the relative URLs in this one case, what if I want to use that JSON snippet within a JSON tree at another URL? do I have to then make the subresources available at URLs relative to that URL too?


Would you like to see the addition of a definition indicating properties that can be treated like anchor ids that can be referenced through fragment identifiers? I have been making sure that it is clear that that is not how rel="self" properties work, and it is slightly different than your pure identity proposal, but maybe that would be a good approach for preserving a pure URI referencing scheme while allow for intra-document id-based referencing. If we did that, I think I would prefer to have an "anchor" attribute rather than an "identityProperties". For example, schema:

{
  properties: {
    id: {anchor: true},

    $ref: {rel:"full", href:"{$ref}"}
  }
}

bit confused by the $ref property in this schema - it looks like a hyperschema link object (which I thought had to go in a separate 'links' property) not a schema object?

instance:
{
   id:"foo",
   me:{$ref:"#foo"}
}

That sort of helps yeah, although one problem is that it assumes that values of anchor properties on instances of all kinds of different object schemas will happily coexist within the same anchor namespace for a given JSON tree.

In my case I have a bunch of different ID namespaces, so it would be nice to know that, eg {id: 1234} in the case of a 'release' object actually expands (based on hyperschema self) to "/releases/1234" whereas {id: 1234} in the case of a 'recording' object expands to "/recordings/1234" hence they are not the same identity. Hence my suggestion above, which maybe I misunderstood again but seemed to be compatible with your statement about using the hyper-schema's target URI.

Another problem might be that this introduces some ambiguity between the two kinds of fragment identifier (path-based and anchor-based) -- or would you ditch the path-based ones?

Seems like we're getting somewhere gradually with this stuff though! sorry to be such a pain in the arse :)

-Matt

Kris Zyp

unread,
Jun 4, 2010, 9:05:46 AM6/4/10
to Matthew Willson, json-...@googlegroups.com


On 6/4/2010 5:21 AM, Matthew Willson wrote:
OK, will be interested to see this spec defined precisely.

Say we have an object {id: "foo"}, to which applies a hyper-schema defining a link relation: {"self": "uri://given-by-hyper-schema/{id}"}.

Should we attempt to treat "foo" itself as the URI for the object in question, relative to that of the parent resource (say http://example.com/given-by-namespace-of-parent-resource/foo) when hyper-schema defines the 'self' relation for this object as having a different URI ("uri://given-by-hyper-schema/foo")?

No, it should be based on the hyper-schema's target URI, which can be relative or absolute (as in this example). If the target is relative, it resolves in context to the parent resource, and if it is absolute, no context is needed.

Aha, great.

Still not sure I've understood it 100% though. So the requirement isn't for the value of that id property to be interpreted as a URI itself (relative or otherwise) - instead the expanded href of the self relation defined by hyper-schema for that object is what determines the URI which an object is defined as having, and this URI can be relative or not depending on your wishes?

(realised I used the wrong syntax for my example, I meant: {"rel": "self", "href": "uri://given-by-hyper-schema/{id}"}  )

That would imply in particular that:

* There's nothing special about the 'id' property - it just happens to be one of the variables in the URI template href of the 'self' link relation

* Different objects might have different hyper-schemas applying to them with different 'self' link relations, meaning that two objects with the same 'id' value are not necessarily equal - eg:

if a hyperschema with {"rel": "self", "href": "some://uri//{id}"} applies to one object with an id: "foo" property
and {"rel": "self", "href": "some://other/uri//{id}"} applies to another object, also with "id": "foo"

these are distinct objects with distinct URIs, neither of which is 'foo' taken as a relative URI relative to the URI of the parent tree.

If there's then a reference to {"$ref": "foo"}, this might not necessarily resolve to either of them; it would expand as a relative URI to http://the.parent/resource/namespace/foo and it would match any object which is an instance of a hyper-schema specifying a 'self' link relation whose target href expands (using the properties of that object) to that URI. It would look for that object first within the current JSON tree, and if that fails it would then attempt to fetch the URI in question in order to obtain the object.


only problem that occurs to me with this approach is that while objects themselves can allow hyper-schema to expand their full URIs, "$ref" references to those objects can't and have to be based on the full expanded URIs.

That introduces a bit of a coupling / dependency, requiring knowledge of the (possibly arbitrary) URI template for the self relation in order to create a correct $ref object. And means there isn't symmetry between the way identity is expressed in referenced and referencing objects.


I am not sure I understand what you mean by lacking symmetry. In json-ref, $ref and id are defining with identical URI templates (just a different variable), and should therefore be resolved in same way. But another schema could define $ref or any other referencing property (rel:"full" or other relations) to follow alternate URI templates, just like rel:"self" links can.



So, it would be nice if there was also do something like

{"$ref": true, "id": "foo"}

which would then expand the 'self' relation of the hyper-schema applying it to determine the actual URI for the $ref. 

That then gives a nice symmetry (where desired) between the way identity is expressed for both referenced and referencing objects, and is quite close to what I was aiming for with identityProperties



The resource should also exist at the URI, if it is a URL (locatable), it would obviously be undesirable for the target to be in conflict or non-existent, as the server would providing inconsistent data. If you are dealing with data where one resource can't be trusted to assign a definition for another resource, than the resource should be fetched (this is discussed in the security section of the json schema spec).

Hm ok. If (as I think you're saying above) the 'self' relation of the hyper-schema is what actually determines the URI, and hence I can choose to use an absolute URI (a URN even) for that, then this isn't a problem for me.

If not though, if having "foo" in {"id": "foo"} interpreted as a relative URI was my only choice, then there would be a problem with "The resource should also exist at the URI, if it is a URL".

In that case:
* What would I do if I wanted to use 'foo' just to express identity local to that JSON tree (which happens to be served from a HTTP URL say) and I didn't want to make a separate subresource available for 'foo' over http at the relative URI

I guess you would want to argue for the usage of the "anchor" attribute for id-based fragment identifiers.


* Even if I am happy to make these subresources available at the relative URLs in this one case, what if I want to use that JSON snippet within a JSON tree at another URL? do I have to then make the subresources available at URLs relative to that URL too?

Yes



Would you like to see the addition of a definition indicating properties that can be treated like anchor ids that can be referenced through fragment identifiers? I have been making sure that it is clear that that is not how rel="self" properties work, and it is slightly different than your pure identity proposal, but maybe that would be a good approach for preserving a pure URI referencing scheme while allow for intra-document id-based referencing. If we did that, I think I would prefer to have an "anchor" attribute rather than an "identityProperties". For example, schema:

{
  properties: {
    id: {anchor: true},
    $ref: {rel:"full", href:"{$ref}"}
  }
}

bit confused by the $ref property in this schema - it looks like a hyperschema link object (which I thought had to go in a separate 'links' property) not a schema object?

Yeah, totally messed that up. It should be:

{
  properties: {
    id: {anchor: true},
  },
  links:[{href:"{$ref}", link:"full"}]

}



instance:
{
   id:"foo",
   me:{$ref:"#foo"}
}

That sort of helps yeah, although one problem is that it assumes that values of anchor properties on instances of all kinds of different object schemas will happily coexist within the same anchor namespace for a given JSON tree.

In my case I have a bunch of different ID namespaces, so it would be nice to know that, eg {id: 1234} in the case of a 'release' object actually expands (based on hyperschema self) to "/releases/1234" whereas {id: 1234} in the case of a 'recording' object expands to "/recordings/1234" hence they are not the same identity. Hence my suggestion above, which maybe I misunderstood again but seemed to be compatible with your statement about using the hyper-schema's target URI.

Sure, so you could choose which type of links you would prefer.


Another problem might be that this introduces some ambiguity between the two kinds of fragment identifier (path-based and anchor-based) -- or would you ditch the path-based ones?

No, we can keep the path-based, and just make it clear in the fragmentResolution strategies defined in the spec that id-based references take precedence, although that would be up to the fragmentResolution strategy (we could have an id-only fragment resolution strategy).

-- 
Thanks,
Kris

Matthew Willson

unread,
Jun 4, 2010, 9:56:19 AM6/4/10
to Kris Zyp, json-...@googlegroups.com
> In json-ref, $ref and id are defining with identical URI templates (just a different variable), and should therefore be resolved in same way. But another schema could define $ref or any other referencing property (rel:"full" or other relations) to follow alternate URI templates, just like rel:"self" links can.

Hmm ok.

So if I have a schema like so:

[
{
id: "first-schema",


properties: {
"id": {"type": "string"},

"myself": {"$ref": "first-schema"}
},
links: [
{"rel": "full", "href": "/first-namespace/{$ref}"},
{"rel": "self", "href": "/first-namespace/{id}"}
]
},
{
id: "second-schema",


properties: {
"id": {"type": "string"},

"myself": {"$ref": "second-schema"}
},
links: [
{"rel": "full", "href": "/second-namespace/{$ref}"},
{"rel": "self", "href": "/second-namespace/{id}"}
]
}
]

With the following instance:

[
{
"id": "foo",
"myself": {"$ref": "foo"}
},
{
"id": "foo",
"myself": {"$ref": "foo"}
}
]

And I'll get two separate objects each with a link to itself? but with those 'foo' ids and $refs being expanded to different URIs (/first-namespace/foo and /second-namespace/foo) based on the hyper-schema full and self relations in force at the points those ids and $refs are found? and the bootstrapping phase which joins up of the object graph will take this into account and join things up based on this notion of identity?

I guess that means the object graph bootstrapping phase needs to be defined precisely, in terms of those 'self' and 'full' hyper-schema relations. It would need to be defined in terms that work not just for the json-ref syntax ($ref and id) but with whatever full and self relations are defined.

So yeah provisionally that sounds OK if you can find a watertight way to define that in a non-circular fashion.

I think there may still be some demons lurking though, relating to the problem I raised with the approach of trying to treat validation of raw reference objects, and validation of the bootstrapped object graph created from those objects, at the same semantic level. I think if an attempt was made to define the semantics formally that would reveal trouble.

IMO it really, really does need specifying precisely though, if not completely formally then with as much rigour as can be mustered.

Because there appear to be a lot of assumptions which you're making about the semantics of json-ref which aren't clear to others; given that the semantics of json-schema, hyper-schema and json-ref are at present so closely interconnected, it really needs pinning down. This thread has clarified some of the assumptions, I think, but I won't really feel like I understand fully (and won't have full confidence that there aren't logical flaws in the combined definition of json-ref and json-schema) until I see a more rigourous spec especially of the json-ref / object-graph-bootstrapping stuff.

I guess this is part of the pain of defining a standard without relying on assumptions implicit in the way some reference implementation does things.


(by the way: if the above works the way I described, I don't think the anchor property proposal is required, since you could instead use something like:

[{"rel": "full", "href": "my-app-specific-urn-scheme:{$ref}"}, {"rel": "self", "href": "my-app-specific-urn-scheme:{id}"}]


-Matt

Kris Zyp

unread,
Aug 27, 2010, 10:38:30 PM8/27/10
to Matthew Willson, json-...@googlegroups.com
What do you think about rather than specifying $ref semantics for JSON
schemas, that we define define "id" as the identity property for the
schema (as we would with json-ref), and then element that can take a
schema should allow a string value, and a string value should be
interpreted as a URL reference/link to a target schema (relative is
allowed, of course)? So one would write:

{
"id": "my-card",
"name":"A schema that extends card",
"extends":"http://json-schema.org/card",
"properties":{
"another-card": "my-card"
}
}
Rather than:
{
"id": "my-card",
"name":"A schema that extends card",
"extends":{"$ref": "http://json-schema.org/card"}
"properties":{
"another-card": {"$ref": "my-card"}
}
}

Essentially one could describe the grammar of a schema as being either
an object (with all the standard attibutes), or a string value, in
which case it should be interpreted as URL and the target resource
should be used as the schema definition. We might also say that if the
string value is one of the primitives types, than that a special
indicator that it is a primitively typed instance (since that is how
"type" works).

The advantage of this approach is that it is provides more concise
schemas, and it might be slightly easier to describe/define and parse.
I think it would be backwards compatible in the sense that updated
schema validators could handle json-ref-based schemas or this design,
since we weren't using strings in place of schemas before.

What do you think, would this be a good change, or problematic?

Matthew Willson

unread,
Sep 6, 2010, 11:03:25 AM9/6/10
to Kris Zyp, json-...@googlegroups.com

If you view this just as some syntactic sugar on top of the existing referencing and identity mechanisms which allows you to avoid the {"$ref": …} boilerplate, I'd be inclined to say that, given these things are primarily (?) for machine-readable use, perhaps syntactic sugar should take a back seat to sorting out the semantics in a rigorous way. (If you wanted to add syntactic sugar, it might be easier to define as a separate step how to transform the sugar into a canonical form, and have the metaschemas only operate in terms of this canonical form, in order to stop the metaschemas getting too messy).

Another issue would be: what if you had a schema which is actually a union of 'string' together with some object type which might supplied via a reference. When looking at an instance, how would you tell whether the string was syntactic sugar for the reference, or an actual string from the union type. (Perhaps your 'special indicator' suggestion was intended as a solution to this - not sure I understood it).

But, I think maybe what you're getting at is less about the special syntax, more that it would be nice if you could define two levels of identity semantics, a simpler one and then a full one based on hyperschema. Which might make bootstrapping easier if the metaschemas themselves stick to using only the simpler level 1 approach.

The level 1 approach would not require hyperschema relations or the use of json-ref, meaning it could be bootstrapped without requiring you to define the semantics of hyperschema stuff. It would just use a standard identity property, "id", with the rule that the object graphs are always to be interpreted as having been joined up by identifying objects in the graph based on the value of this identity property. It would not place any assumptions or constraints on how these identities/references are to be resolved outside the current document.

Level 2 could then add extra hypermedia semantics on top of this, allowing hyperschema to be used to specify exactly how any identity/reference properties should be mapped to URIs, or URLs in order to resolve references outside the current document. That would then also allow you to use something other than the default of 'id' as the identity property, and to specify other hypermedia relations as desired.

Ideally the approach would allow for level 2 to be backwards compatible, at least with some interpretation of the current not-quite-pinned-down spec.

I'd like this since it decouples things quite a bit and could reduce the complexity of the json-schema semantics quite a bit. Think it would really require someone to sit down and rigorously hash this stuff out in a formalised way in order to pin it down, ideally writing a reference implementation as part of the process with a bunch of test cases for any awkward corner cases. I imagine a bunch of issues might come up during that process, but without actually attempting it it's a bit hard to say. I'd be willing to give it a go if I had a clear mandate to sort this stuff out, but I feel like I might be stepping on people's toes a bit, or that the standard has already advanced to a point where full backwards compatibility might turn out to be too much of a constraint to really get it right. I'd also probably need to spend a bunch more time making sure I fully understand the motivation and the mental model behind the draft hyperschema / json-ref specs.

Let me know what you think I guess...

Regards,
-Matt

Gary Court

unread,
Sep 7, 2010, 10:28:19 PM9/7/10
to JSON Schema
> Another issue would be: what if you had a schema which is actually a union of 'string' together with some object type which might supplied via a reference. When looking at an instance, how would you tell whether the string was syntactic sugar for the reference, or an actual string from the union type.

+1
I too agree that his would also lead to more confusion.

Kris Zyp

unread,
Sep 13, 2010, 11:28:54 PM9/13/10
to JSON Schema


On Sep 6, 9:03 am, Matthew Willson <matthew.will...@gmail.com> wrote:
> [snip]
> If you view this just as some syntactic sugar on top of the existing referencing and identity mechanisms which allows you to avoid the {"$ref": …} boilerplate, I'd be inclined to say that, given these things are primarily (?) for machine-readable use, perhaps syntactic sugar should take a back seat to sorting out the semantics in a rigorous way. (If you wanted to add syntactic sugar, it might be easier to define as a separate step how to transform the sugar into a canonical form, and have the metaschemas only operate in terms of this canonical form, in order to stop the metaschemas getting too messy).
>
> Another issue would be: what if you had a schema which is actually a union of 'string' together with some object type which might supplied via a reference. When looking at an instance, how would you tell whether the string was syntactic sugar for the reference, or an actual string from the union type. (Perhaps your 'special indicator' suggestion was intended as a solution to this - not sure I understood it).

How painful is it to have a reserved set of identifiers/URLs for the
primitive types? What issues would result from having "string",
"number", etc. as a fixed set of reserved identifiers. This seems like
a pretty small set of reserved identifiers. From the resolvers point
of view, handling these would be pretty trivial. From the schema
author's perspective this is only a problem if you had a preexisting
sibling resource (although if the resource is another schema, it
should be easy avoid giving one of the reserved names) that you had to
reference, you could always use "./string" to achieve the same
relative resolution, I believe. Are there aspects that I haven't
considered?
>
> But, I think maybe what you're getting at is less about the special syntax, more that it would be nice if you could define two levels of identity semantics, a simpler one and then a full one based on hyperschema. Which might make bootstrapping easier if the metaschemas themselves stick to using only the simpler level 1 approach.

I had both of these in mind. I had proposed this both to simplify the
specification of JSON schema's bootstrapping referencing and to enable
more succinct, readable, and easy to author schemas.

> The level 1 approach would not require hyperschema relations or the use of json-ref, meaning it could be bootstrapped without requiring you to define the semantics of hyperschema stuff. It would just use a standard identity property, "id", with the rule that the object graphs are always to be interpreted as having been joined up by identifying objects in the graph based on the value of this identity property. It would not place any assumptions or constraints on how these identities/references are to be resolved outside the current document.
>
> Level 2 could then add extra hypermedia semantics on top of this, allowing hyperschema to be used to specify exactly how any identity/reference properties should be mapped to URIs, or URLs in order to resolve references outside the current document. That would then also allow you to use something other than the default of 'id' as the identity property, and to specify other hypermedia relations as desired.
>
> Ideally the approach would allow for level 2 to be backwards compatible, at least with some interpretation of the current not-quite-pinned-down spec.
>
> I'd like this since it decouples things quite a bit and could reduce the complexity of the json-schema semantics quite a bit. Think it would really require someone to sit down and rigorously hash this stuff out in a formalised way in order to pin it down, ideally writing a reference implementation as part of the process with a bunch of test cases for any awkward corner cases. I imagine a bunch of issues might come up during that process, but without actually attempting it it's a bit hard to say.  I'd be willing to give it a go if I had a clear mandate to sort this stuff out, but I feel like I might be stepping on people's toes a bit, or that the standard has already advanced to a point where full backwards compatibility might turn out to be too much of a constraint to really get it right. I'd also probably need to spend a bunch more time making sure I fully understand the motivation and the mental model behind the draft hyperschema / json-ref specs.
>
> Let me know what you think I guess...

Anything that you would want to help out with is fine by me, my toes
won't hurt. As far as reference implementation, it seems it would save
some time working on an existing one, but that's up to you. If you
have spec text you want to write, that sounds great to.

Thanks,
Kris

Gary Court

unread,
Sep 14, 2010, 4:29:09 PM9/14/10
to JSON Schema
> How painful is it to have a reserved set of identifiers/URLs for the
> primitive types? What issues would result from having "string",
> "number", etc. as a fixed set of reserved identifiers. This seems like
> a pretty small set of reserved identifiers. From the resolvers point
> of view, handling these would be pretty trivial. From the schema
> author's perspective this is only a problem if you had a preexisting
> sibling resource (although if the resource is another schema, it
> should be easy avoid giving one of the reserved names) that you had to
> reference, you could always use "./string" to achieve the same
> relative resolution, I believe. Are there aspects that I haven't
> considered?

It's certainly accomplishable to reserve a set of keywords, it just
seems like such a... dirty way of doing it. You're then mixing
keywords with URIs, meaning the semantics of the property value depend
on it's value. Every property in JSON Schema to this point has had a
different instance type to indicate a different type of data. For the
type property: if it's a string it's a type, if it's an object it's a
schema, if its an array is a union of types. With this change, a
string is either a type or a reference (URI) and the contents must be
examined to determine it's meaning. As I said in another post, despite
it's verbose nature, {"$ref":"uri"} is unmistakable as a reference;
it's even easier to read as you don't have to know what property it
belongs to to understand that is is a reference.

But I'll be honest, I'm indifferent about this decision. It does have
the advantage of reducing the amount of schema you have to write, and
makes schema referencing easier for simple validators. I think I'd
feel better about this approach if string values for types were always
URIs, and you would have special URIs to denote the primitive types
like "json:string", "json:object", etc.

-Gary

Kris Zyp

unread,
Sep 14, 2010, 11:30:14 PM9/14/10
to json-...@googlegroups.com, Gary Court

On 9/14/2010 2:29 PM, Gary Court wrote:
>> How painful is it to have a reserved set of identifiers/URLs for the
>> primitive types? What issues would result from having "string",
>> "number", etc. as a fixed set of reserved identifiers. This seems like
>> a pretty small set of reserved identifiers. From the resolvers point
>> of view, handling these would be pretty trivial. From the schema
>> author's perspective this is only a problem if you had a preexisting
>> sibling resource (although if the resource is another schema, it
>> should be easy avoid giving one of the reserved names) that you had to
>> reference, you could always use "./string" to achieve the same
>> relative resolution, I believe. Are there aspects that I haven't
>> considered?
> It's certainly accomplishable to reserve a set of keywords, it just
> seems like such a... dirty way of doing it. You're then mixing
> keywords with URIs, meaning the semantics of the property value depend
> on it's value. Every property in JSON Schema to this point has had a
> different instance type to indicate a different type of data. For the
> type property: if it's a string it's a type, if it's an object it's a
> schema, if its an array is a union of types. With this change, a
> string is either a type or a reference (URI) and the contents must be
> examined to determine it's meaning. As I said in another post, despite
> it's verbose nature, {"$ref":"uri"} is unmistakable as a reference;
> it's even easier to read as you don't have to know what property it
> belongs to to understand that is is a reference.

I don't think that the meaning of "string" or "number" is semantically
all that different, one can think of it as being defined as reference to
a built-in schema that accepts string values or number values. The only
thing that is special is that it doesn't follow normal relative URL
resolution rules (because it would normally be treated as a relative
URL). If you can swallow the URL interpretation, having built-in schemas
seems reasonably consistent.

> But I'll be honest, I'm indifferent about this decision. It does have
> the advantage of reducing the amount of schema you have to write, and
> makes schema referencing easier for simple validators. I think I'd
> feel better about this approach if string values for types were always
> URIs, and you would have special URIs to denote the primitive types
> like "json:string", "json:object", etc.
>

A custom protocol would indeed be more correct :). What is more
important, convenience (and backwards compatibility) or correctness?

--
Thanks,
Kris

Gary Court

unread,
Sep 15, 2010, 12:20:25 AM9/15/10
to JSON Schema
> A custom protocol would indeed be more correct :). What is more
> important, convenience (and backwards compatibility) or correctness?

Normally I would say correctness but, due to the fact that JSON Schema
has been worked on and implemented for some time now, I would say
having convenience and backwards compatibility is more important in
this decision.

Gary Court

unread,
Sep 16, 2010, 4:46:17 PM9/16/10
to JSON Schema
Hey, I just realized that it would be impossible for us to implement
the "requires" attribute if schemas can also be strings as a parser
would not be able to tell if the value is supposed to be a property
name or schema URI.

-Gary

Kris Zyp

unread,
Sep 16, 2010, 4:48:33 PM9/16/10
to json-...@googlegroups.com, Gary Court
Good point, we would need to modify "requires".
Kris

--
Thanks,
Kris

Gary Court

unread,
Sep 16, 2010, 4:59:59 PM9/16/10
to JSON Schema
> Good point, we would need to modify "requires".

Do you mean you agree we should abandon "schemas can be URIs", or that
that the "requires" attribute needs to be modified to support this? I
don't see how we can do the latter without breaking backwards
compatibility.

Kris Zyp

unread,
Sep 17, 2010, 3:34:25 PM9/17/10
to json-...@googlegroups.com, Gary Court

On 9/16/2010 2:59 PM, Gary Court wrote:
>> Good point, we would need to modify "requires".

> Do you mean you agree we should abandon "schemas can be URIs", or that
> that the "requires" attribute needs to be modified to support this? I
> don't see how we can do the latter without breaking backwards
> compatibility.

I think we solve this by simply not allowing/supporting URIs as the
value for the requires attribute. This is one particular attribute that
takes schema, but probably rarely actually needs to reference other
schemas. Most of the time this is used to just add an additional
constraint that is more complex than simple property existence.
Therefore "requires" simply behaves like it did in draft 2 (so it
back-compat). Of course you can still reference another schema with
"requires":
"requires":{
"extends":"http://some.other/schema"
}

--
Thanks,
Kris

Gary Court

unread,
Sep 18, 2010, 1:19:00 PM9/18/10
to JSON Schema
Ok, I was under the impression that a schema was a union type of
["string", "object"]. But as you're stating here, we can explicitly
choose which attributes accept reference URIs or not. Don't you think
that will be confusing for people? Having to know that requires is
different then all other attributes that accept schemas?

-Gary

Kris Zyp

unread,
Sep 18, 2010, 5:30:02 PM9/18/10
to json-...@googlegroups.com, Gary Court

On 9/18/2010 11:19 AM, Gary Court wrote:
> Ok, I was under the impression that a schema was a union type of
> ["string", "object"]. But as you're stating here, we can explicitly
> choose which attributes accept reference URIs or not. Don't you think
> that will be confusing for people? Having to know that requires is
> different then all other attributes that accept schemas?

We could alternately separate requiring a property and and requiring a
schema into separates properties. We could have a "requires" or
"requiresProperty" and a "requiresSchema" so there is no ambiguity (or
decide one of them is not needed).

--
Thanks,
Kris

Gary Court

unread,
Sep 22, 2010, 2:00:39 PM9/22/10
to JSON Schema
Or we could abandon the idea that schemas can be URI strings. :-) I'm
not big on seperating the requires attribute into two, but I don't see
any other way of doing it as being able to specify a schema is needed
for supporting complex relationships with optional properties.

-Gary
Reply all
Reply to author
Forward
0 new messages