Sometimes it is hard to understand how the id works, especially how the root id (the one in the root of the schema) is missing or incomplete and therefore will be redefined by the environment
Per default, an URL is composed of
1. A scheme
2. An authority
3. A path
4. A query
5. A fragment
- The scheme is the string which appears before the first ":" in an url example "aScheme:"
- The Authority is the the string right after the double slash and delimited by end of line or "/" "//authority" or "//authority/"
- The path is absolute if stating with a "/" relative to its current position otherwise. example: "/path/to/aFile.json" We sometimes call the base path the " /path/to/" and the relative one the ending "file.json" part.
- The query (not relevant in our case)
- The fragment : starting with a "#", in the json schema, we do even have fragment path (json pointers) example : "#fragment/path"
Per default, while accessing a resource we use:
- The scheme : to define the protocol to use (http, file, ftp, urn, …) are popular scheme.
- The authority : it does defines the root server-side document. example: "www.json.com" if it is a domain, but can also be "123-456-789-123" if an uuid (associated to the urn scheme)
- The path: it does defines the server-side path the server have to browse to in order to return the right document. example : "/path/to/stuff.json"
- The query : if the path points to a service, the query will be there to pass arguments to the service supposed to return the content.
- The fragment : it does defines the client-side document path. When you have a document, the fragment can point to a part of the document using the fragment path (or json pointer)
Even if you are used to use incomplete urls like "www.json.com", "/", "#" and so on… the engine behind always have to build an absolute one.
for example, in a browser:
- www.json.com will be implicitly resolved to: http://www.json.com/#
- "/" to the root of your current url. If you are in "scheme://authority/base/path/relativePath.com#fragment", you will be redirected to "scheme://authority/#" (the path is changed to "/", the fragment reset)
- "#" to the root of your current document. "scheme://authority/base/path/relativePath.com#fragment/path" becomes "scheme://authority/base/path/relativePath.com#"
Now… when you do store schemas into the environment, they will automatically be associated to an url (based on the root "id" property of the schema), if there is no "id" in your schema, or if it is incomplete, the environment will complete it for you in order it to be able to later retrieve your schema correctly.
The minimum requirement to find a document being the scheme and the path (or authority, it depends on the protocol [1]), your environment have to provide a default scheme as well as being able to forge an unique path (or authority) if none is provided.
Note that per default, if the schema has been retrieved from an url, it will use its scheme, authority and path to create an id but when those informations are not available, it's the environment work to create its own.
For example, the JSV environment decided to use the "urn" as the default scheme, and an uuid (Universally unique identifier) to build a default path (or authority again it depends on how you see it)
Here is how your schemas will be implicitly resolved (here is JSV)
A full url
{ "id" : "http://www.json.com/#root" }
Scheme: http
Hier-part: www.json.com/
Fragment: root
Keeps the same
{ "id" : "http://www.json.com/#root" }
Scheme: http
Hier-part: www.json.com/
Fragment: root
An url without any scheme
{ "id" : "www.json.com/#root" }
Scheme:
Hier-part: www.json.com/
Fragment: root
The default "urn" scheme is added
{ "id" : "urn://www.json.com/#root" }
Scheme: urn
Hier-part: www.json.com/
Fragment: root
Anything without any scheme
{ "id" : "mickey" }
Scheme:
Hier-part: mickey
Fragment:
The default "urn" scheme is added
{ "id" : "urn:mickey#" }
Scheme: urn
Hier-part: mickey
Fragment:
Only a scheme
{ "id" : "scheme:" }
Scheme: scheme
Hier-part:
Fragment:
The path is considered as empty (or "")
{ "id" : "scheme:#" }
Scheme: scheme
Hier-part:
Fragment:
Nothing
{ }
Scheme:
Hier-part:
Fragment:
The default "urn" scheme is added, a default path (uuid) is created
{ "id" : "urn:uuid:12345678-1234-1234-1234-1234567890ab#" }
Scheme: urn
Hier-part: uuid:12345678-1234-1234-1234-1234567890ab
Fragment:
Only a fragment
{ "id" : "#mySchemaName" }
Scheme:
Hier-part:
Fragment: mySchemaName
No scheme, no hier-part, it is handled the same way as "Nothing" i.e. The default "urn" scheme is added, a default path (uuid) is created. The fragment stays the same
{ "id" : "urn:uuid:12345678-1234-1234-1234-1234567890ab#mySchemaName" }
Scheme: urn
Hier-part: uuid:12345678-1234-1234-1234-1234567890ab
Fragment: mySchemaName
In summary, in the case where the schema has no provenance (has not been retrieved from a remote url but just inserted into the env):
- If there is no id in the schema root, we create one using a uuid
- If there is no scheme defined in the id, we use "urn" as the default scheme
[1] : In some cases, there is simply no authority. For example when you type file:///some/path/to/a/file the authority is blank i.e. "", this explains why you do have 3 slashes. The reason is that you are supposed to be on the computer you are looking for a file, and therefore there is no need for an authority. In an environment, this is the same, we are in our environment and therefore an authority is not mandatory. However it is not obvious to call a uuid a path neither an authority, because it is neither the first or the second. This is why we usually talk about the "hier-part" which is simply the string identifier which is able to point to a document according to a scheme.
I still have concerns though, as the spec in theory allows any type of
URI anywhere, and that is a problem.
For instance, this is legal, in theory:
{
"type": "object",
"properties": {
"p1": {
"id": "http://some.site/path/to/my.json#/some/fragment/here"
}
}
}
What are you supposed to do with such a schema?
Similarly, fragments as IDs: why?
{
"type": "object",
"properties": {
"p1": {
"id": "#myfragment"
}
}
}
This conflicts with JSON Pointer and makes two ways to access the same
schema: urn:uuid:whatever:#myfragment and
urn:uuid:whatever:#/properties/p1
This is why I have voiced the URI concern so long ago and maintain my
stance: not every URI should be allowed in "id", and subschemas should
only ever be accessed using JSON Pointer.
--
Francis Galiegue, fgal...@gmail.com
"It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries" (Stéphane Faroult, in "The
Art of SQL", ISBN 0-596-00894-5)
This is not a problem to be able to access the same schema using 2 different path. This is a bit like using a folder alias in an HD, or having different domains pointing to the same ip address.
The opposite case would be problematic i.e. having 2 schema belonging to the same url, but this only happens if you override an existing path with an id. for example:
{
"id" : "#notUnique",
"properties" :
{
"aKey":
{
"id" : "#notUnique"
}
}
}
Unless you are not in this case, you are free to identify you schema or inner schemas the way you want (even if there is no logical meaning to the scheme and hier-part value you set)
For your first example, your env should index the inner schema:
{
"id" : "http://some.site/path/to/my.json#/some/fragment/here"
}
simply under the explicit url "http://some.site/path/to/my.json#/some/fragment/here"
That's all… this said, I would strongly discourage using a fragment path in an id…, this will just mess up your index table and be subject to ambiguous interpretation.
The choice of the id is risky if not wisely chosen… the purpose of an id is to have something unique and efficient. Nothing prevents you to do the following:
myscheme://fge/bigProject1/users/#address
myscheme://fge/bigProject1/users/#purchases
myscheme://fge/bigProject1/users/#history
For the fragment as id, this is just the usual way to name an inner schema. When you specify an id somewhere which is not an absolute one (and a single fragment is obviously not one) you have to resolve it to make it absolute.
When I parse an "id" the rules are as follow:
1. if this "id" is absolute, index it as it is otherwise resolve it relatively to the root "id" property of the containing schema.
2. If the containing schema root "id" is relative (or empty), resolve it relatively to the source "url" where the schema has been retrieved (for example from a html page or a file).
3. If the source does not exist, i.e. the schema has been created from "the code" and not loaded from anywhere… use the urn:uuid:123456 as the default url.
Now if you do not want to use the fragments here is a "#fragment" vs "/path"
{
"id":"scheme://authority/aPath/schema.json
"property":{"key":
{
"id":"#fragment" // accessible through scheme://authority/aPath/schema.json#fragment
}
}
}
{
"id":"scheme://authority/aPath/schema.json
"property":{"key":
{
"id":"/path" // accessible through scheme://authority/path/#
}
}
}
PS: Note that normalization is important while dealing with url, for example: http://domain/something have to be equal to http://domain/something/.//# otherwise your environment will not be able to retrieve anything in the index table…
No, it is not the same.
You mention the case of using a "folder alias in an HD": this does not
identify a resource at all. On a filesystem, I can very well make it
so that file:///a/b/c and file:///foo/bar/baz point to exactly the
same object -- ON THE SAME MACHINE. All I have to do is a well-crafted
set of mount points and symlinks. The only unique identifier for the
resource is local only and is the inode number of the relevant file.
Which can have any other URI for that matter, it is just a matter of
how _I_ decide to make it available. _But that doesn't make it
available to the outside world_.
On the other hand, using JSON Pointer, there are no two ways of
referring to a path into a JSON document (whether it be a schema or
anything else), there can only be ONE. And that is the core value of
it. The _ROOT_ of the document doesn't matter. It may be
"myscheme://myauthority/my/path" or
"http://some.site/path/to/the.json" -- who cares? The third party
accessing the document does NOT have to second guess what
"#somefragment" may refer to if you use JSON Pointer.
Have a look at that:
{
"type": "object",
"id": "#/foo",
"foo": {
"type": "integer"
}
}
And you are asked to validate an instance with this schema and path
"#/foo": what do you do?
I persist: what can be found in "id" MUST be restricted, and NEVER
include fragments. Fragments, in JSON Schema, MUST be the domain of
JSON Pointer, since this is the only spec currently able to address
all of a JSON document, whether it be a JSON Schema or other, in a
unique, non ambiguous way.
Again, why? There is JSON Pointer.
> In a json schema, adding an ID is equivalent to adding an alias in the root
> of the schema. If 2 aliases have the same name, one will override the
> second.
I don't understand, what do you mean?
> Moreover, I would never use fragments path (such as #/foo) as an ID[...]
... but the spec currently allows for it. The spec allows for anything
as an URI within "id" and I consider this an error.
Another example to illustrate my point:
{
"s1": {},
"s2": {
"id": "#/s1"
}
}
This is legal. What is #/s1 in this schema?
If you forbid embedded ids/aliases/etc, forbid non empty fragments in
schema IDs, and mandate the use of JSON Pointer for addressing, you
get rid of all these issues instantly. This is 10 lines to be added to
the spec and we are definitely done with all addressing problem and
simplify implementations a _lot_. You get all the advantages and none
of the drawbacks.
Fair point. But if you have such a JSON Pointer to write, you have a
schema design problem :)
> It is allowed to drink poison but I would not do it, it is allowed to make a
> schema with ambiguous id paths but I would neither not do it. In your
> example, yes it is legal but ambiguous. It is then to your implementation
> convention to rule if the priority is made on json pointers path or on set
> ids.
Disagree. If you leave this up to implementations, you'll end up with
different validations for the same schema. It is a recipe for
disaster... Subschema resolution, with fragment ids and JSON Pointer,
and their priority, should be decided upon and put as a requirement in
the spec proper.
> Even if I agree with you that it is really tricky (not to say dangerous) to
> use fragment paths in ids. This said there is 2 cases where they can be
> really useful
> 1. Redefining a ghost path : you modified your schema but would like to keep
> retrocompatibility, you can still create aliases using {"id":"#/missingKey/"
> "extends":{"$ref":"#/newKey"}}
Careful, "#/missingKey/" and "#/missingKey" are not the same thing!
The first looks for key "" of key "missingKey" from the root of the
JSON document...
I disagree with this proposal though: if your core schema changes, so
should its id. Problem solved.
> 2. Overriding a schema at somepath : {"id":"#/aDenied/properties/key",
> "deny":"all"}
>
I don't understand that...
> In my opinion, the rfc should keep allowing using fragments in path but put
> a convention on how to manage ambiguous cases (which schema to trust in
> priority). I like the way : the most specific to the more generic i.e.
> longest id to json pointers. (like CSS actually)
>
Fair enough. Here is my proposal:
* at the root of a schema, if there is an ID, it should be absolute,
with either no fragment or an empty fragment;
* IDs in subschemas can only be fragments, but MUST NOT be JSON Pointers;
* no two subschemas in the same schema can have the same ID;
* when resolving a schema referenced via $ref:
- if the URI is absolute, the implementation should split the URI
between the locator and fragment, then resolve the locator; no
fragment and an empty fragment are the same;
- if the fragment is not a JSON Pointer, then the implementation
should look for a subschema with an ID matching the fragment from the
resolved schema;
- if it is a JSON Pointer, then the implementation should compute
the path from the root of the resolved schema (see the JSON Pointer
spec);
- $ref should be resolved recursively;
* if, in the process of $ref resolution, one such event happens:
- the locator cannot be resolved to a JSON document; or
- recursive $ref resolution leads to a loop; or
- a document resolved by $ref is not a JSON Object;
then the implementation MUST consider validation a failure.
This needs better wording, but you get the idea...
- As the validators uniformity matters, I would suggest to make an open json-schema test unit in order everyone to conform his validator to the same behavious (especially in tricky cases).
- Reading an RFC is never a pleasure, examples covering all the cases (if applicable) would strongly help (if we can join this with a common test unit…)
- I opened a topic concerning the trailing slash / in json pointers (very nice to have notices this)
- Overriding the path #/properties/anOverridenPath with the scheme {"deny":"any"}, would make the following document invalid {"anOveridenPath":"something"}; (this could have been useful in MongoDB queries)
For your proposal:
1. I do not agree again. The rfc specifies the following fallback : id relatively to the url where the shema has been got, relatively to the environment. This is only a matter of building absolute paths from relative one.The rfc RFC 2396 tells how to do it, you should be able to use the java.net.URL by URL url = new URL( "http://base.com/url" , "../relativeOne.html");
Normally: new URL( "http://absoluteOne.com" , "http://notARelative.com"); will give you http://notARelative.com
Try:
String environment = "urn:"; // your fallback scheme (you could have made a whole fallback id as well "urn:uuid:123456")
String downloadedFrom = "//www.domain.com/theFile.json";
downloadedFrom = (new URL( environment , downloadedFrom)).toString(); // will absolutize the downloadedFrom url
String rootID = "#fragmentName";
rootID = (new URL( downloadedFrom , rootID)).toString(); // will absolutize the rootID
-> rootID should be = urn://www.domain.com/theFile.json/#fragmentName
-> index your schema in a dictionary under the (absolute and normalized) url rootID, you are done…
-> add one more step for inner schema (where the absolute path will be the rootID and the relative one the id's url)
If your rootID has been "http://www.mickey.com/#" it would simply not have been changed.
If the downloadedFrom has been = "", we would simply omitted this step. (new URL( anAbsolutePath , "")).toString().equals(anAbsolutePath) is true.
2. Any url but no fragment path (json pointers) -> no more overriding, well… not bad in some way
3. An id should be unique -> I agree, this said it should be already the case (or would you validate them all ? :-) )
4. a. about url normalization http://en.wikipedia.org/wiki/URL_normalization , probably that using the URL class will get rid of this problem itself. Never store an url by yourself. Create an URL with the url string, normalize it (if applicable) then serialize it, this will solve a lot of things by itself.
5. I agree all of those cases
a. The ref is not found, timeout, unavailable -> failure
b. Recursive $ref will probably results in a stack overflow… the worst case being {"$id":"#A", "$ref":"#A"} -> failure
c. The ref is not a valid schema -> failure
On Fri, Mar 9, 2012 at 09:28, Xample <flavien...@gmail.com> wrote:
[...]
> It is allowed to drink poison but I would not do it
... and the best solution for this is to _not_ have poison to drink in
the first place. Hence my proposal.
And if the machine names don't exist, this fails. URL does name
resolution. You cannot rely on that.
Please, understand my point of view: I want to keep things _simple_,
it is the best way to kickstart a more widespread use of JSON Schema.
Restricting the values of "id" the way I proposed it makes
implementations easy.