New features: v5

622 views
Skip to first unread message

Geraint

unread,
Oct 16, 2013, 7:27:15 AM10/16/13
to json-...@googlegroups.com
(Firstly: I apologise for taking longer than I promised.  I thought I'd have a draft proposal out by now - I'm writing it, though!)

OK, so I feel confident we've collected all the candidate features for v5 by now, and they're all documented at: https://github.com/json-schema/json-schema/wiki/v5-Proposals

Now, it's time for hammering out details, debating our favourites, and general nitpicking and infighting.  To start us off, here are my current thoughts:

TL;DR - I like almost all of them.  I'm not a great fan of "patternGroups" or "baseUri".  The syntax for "propertyLinks" seems slightly awkward but I want it anyway.

General features:
  • "$data" - I really like this.  It's flexible and powerful, and can be fairly straightforward to implement.
  • Multi-lingual meta-data - I like this.  Simple, flexible, enhances accessibility.
  • "enumNames" - I like this, especially when combined with the multi-lingual meta-data.
  • Ban unknown properties mode - I don't see myself using this personally, but it's pretty simple to implement (speaking from experience), and seems to be useful to some people.

Validation/structural features:

  • Alternative string values for type - 100%
  • "switch" - I like this.  A new keyword is a relatively big change, but it would certainly make my schemas more readable and concise.
  • "patternGroups" - I'm not sure.  It would break compatability with v4, and I think "minProperties"/"maxProperties" are enough for almost all cases.
  • "contains" - Yes please.  If the "array mode" is allowed, implementation is not completely trivial, but I think it would be useful.

Hyper-schema features:

  • Extended templating syntax - I really like this.  Flexible, powerful, concise, backwards-compatible.
  • Templating for rel - I'd like this as well.
  • "propertyLinks" - I really like the idea of this one.  I'm slightly less sold on the syntax, but I think the functionality is important.
  • "baseUri" - Full of complications.  Not really a fan.
  • "linkSource" - Relatively small change.  It's not something I would use every day, but it would get me out of a tight spot every now and again.

Semantic:

  • "formatMinimum"/"formatMaximum" - Yes.  I've been missing this.
  • Re-introduction of some v3 format values - 100%
  • Additional format values - Yes.
  • "unordered" - I like this.  It would have no effect on validation, but is useful meta-data for lists, especially when dealing with APIs.

Right, your turn - let's hear it. :)

Chris Miles

unread,
Oct 16, 2013, 12:21:23 PM10/16/13
to json-...@googlegroups.com, Geraint
In v3 (version I am using) baseURI is handy as the base for a schema
URI especially when the document has no inherent URI.

Chris

On Wed 16 Oct 2013 12:27:15 BST, Geraint wrote:
> /(Firstly: I apologise for taking longer than I promised. I thought
> I'd have a draft proposal out by now - I'm writing it, though!)/
>
> OK, so I feel confident we've collected all the candidate features for
> v5 by now, and they're all documented at:
> https://github.com/json-schema/json-schema/wiki/v5-Proposals
>
> Now, it's time for hammering out details, debating our favourites, and
> general nitpicking and infighting. To start us off, here are my
> current thoughts:
>
> */TL;DR/* - I like almost all of them. I'm not a great fan of
> "patternGroups" or "baseUri". The syntax for "propertyLinks" seems
> slightly awkward but I want it anyway.
>
> *General features:*
>
> * _"$data"_ - I really like this. It's flexible and powerful, and
> can be fairly straightforward to implement.
> * _Multi-lingual meta-data_ - I like this. Simple, flexible,
> enhances accessibility.
> * _"enumNames"_ - I like this, especially when combined with the
> multi-lingual meta-data.
> * _Ban unknown properties mode_ - I don't see myself using this
> personally, but it's pretty simple to implement (speaking from
> experience), and seems to be useful to some people.
>
> *Validation/structural features:*
>
> * _Alternative string values for type_ - 100%
> * _"switch"_ - I like this. A new keyword is a relatively big
> change, but it would certainly make /my/ schemas more readable and
> concise.
> * _"patternGroups"_ - I'm not sure. It would break compatability
> with v4, and I think "minProperties"/"maxProperties" are enough
> for almost all cases.
> * _"contains"_ - Yes please. If the "array mode" is allowed,
> implementation is not completely trivial, but I think it would be
> useful.
>
> *Hyper-schema features:*
>
> * _Extended templating syntax_ - I really like this. Flexible,
> powerful, concise, backwards-compatible.
> * _Templating for rel_ - I'd like this as well.
> * _"propertyLinks"_ - I really like the /idea /of this one. I'm
> slightly less sold on the syntax, but I think the functionality is
> important.
> * _"baseUri"_ - Full of complications. Not really a fan.
> * _"linkSource"_ - Relatively small change. It's not something I
> would use every day, but it would get me out of a tight spot every
> now and again.
>
> *Semantic:*
>
> * _"formatMinimum"/"formatMaximum"_ - Yes. I've been missing this.
> * _Re-introduction of some v3 format values_ - 100%
> * _Additional format values_ - Yes.
> * _"unordered"_ - I like this. It would have no effect on
> validation, but is useful meta-data for lists, especially when
> dealing with APIs.
>
> Right, your turn - let's hear it. :)
>
> --
> You received this message because you are subscribed to the Google
> Groups "JSON Schema" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to json-schema...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Geraint

unread,
Oct 16, 2013, 12:34:57 PM10/16/13
to json-...@googlegroups.com, Geraint
Do you mean you use it for resolving "$ref"s, or for resolving link URLs?

If it's the first: why do your schemas not have URIs?  A schema without a URI is un-referencable - how big are these schemas?

If it's the second: what do you do when a sub-schema defines the links, but "baseURI" is defined in the parent schema?  What if the sub-schema is referenced directly, not implied using "properties"?

Chris Miles

unread,
Oct 17, 2013, 5:00:36 AM10/17/13
to json-...@googlegroups.com, Geraint
I mean for resolving "$ref"s. We use the schema for definition during
development, they are not available from a web server.

We use the baseURI when registering the schema with the JSV validator
from Gary Court. For example:
%%%%%%%%%%%%%%%%%%%
{
"$schema":"http://json-schema.org/hyper-schema#",
"baseURI":{"$ref":"http://moonfruit.com/util/src/json/schema/page/"},

"id":"lib/MRef#",
...
%%%%%%%%%%%%%%%%%%%
which gives a URI of
http://moonfruit.com/flash/util/src/json/schema/page/lib/MRef#

A sample reference:
%%%%%%%%%%%%%%%%%%%
{
"$schema":"http://json-schema.org/hyper-schema#",
"baseURI":{"$ref":"http://moonfruit.com/util/src/json/schema/site/"},

"id":"startup/GdsBackground#",
"type":"object",

"properties":{
"src":{"$ref":"../../page/lib/MRef#"},
...
%%%%%%%%%%%%%%%%%%%

We have over 200 schema composed of over 7,500 lines.

Chris

Geraint

unread,
Oct 17, 2013, 12:39:22 PM10/17/13
to json-...@googlegroups.com, Geraint
Interesting - I hadn't heard of "baseURI" like this.  It's not in any version of the standard, and I'm not aware of any other validators that use it.

What purpose does "baseURI" actually serve?  Wouldn't the same effect be achieved by having "id":"http://moonfruit.com/util/src/json/schema/page/lib/MRef#"?  Also, why does it use a "$ref"?  What are the mechanics of this property?

It seems like it might not be a keyword as such, but more a directive when loading the schemas, to "pretend" that they were fetched from a URI.  That's sort of outside the scope of the schema itself, and sounds like it might be covered more thoroughly by "id" anyway...

reda.b...@gmail.com

unread,
Oct 17, 2013, 3:49:04 PM10/17/13
to json-...@googlegroups.com
Would any feature in v5 tackle "arbitrary extensibility" ? 

Geraint

unread,
Oct 17, 2013, 7:56:47 PM10/17/13
to json-...@googlegroups.com, reda.b...@gmail.com
The problem is that the ability to specify arbitrary schemas would mean that a set of schemas is no longer a closed system.  Being a closed system is very convenient - it means you can be certain you have pre-fetched all your schemas (so you can validate without a network connection), you can process schemas (e.g. generate Java classes from schemas) without running into unknown trouble, etc.  Hopefully you can see why this might be worth preserving.

The "describedby" solution (full disclosure: I am "cloudfeet" on StackOverflow, so that answer is from me) works as part of the hyper-schema part of the spec, so those concerns do not apply - it would have no role in validation.  Even a hyper-validator should ignore the links when actually validating, and then after the initial validation figure out whether it should perform additional validation against linked schemas.

The possibility of allowing "$ref" to be templated has been raised - however, I am actually against it.  Templating for "$ref" would basically force all validators to be hyper-validators - forcing complication on every vadliator, as well as losing all the advantages of a closed system.

I believe the more appropriate answer is for hyper-validators to become more common.  If you're working in JavaScript, I would be very happy to put support for "describedby" links into tv4 (there's already support in Jsonary, but that's not primarily aimed at validation).  You could also try requesting it as a feature on the excellent json-schema-suite - the modular nature structure of that tool may (or may not) come in handy.

I should probably also put up a GitHub wiki page describing what a "hyper-validator" might do, to ensure consistent behaviour.  There's a decent chance that v5 will have a section on "alternative validation modes", so it might well be sketched out in the spec as well.

Geraint

jul...@domdex.com

unread,
Oct 17, 2013, 10:11:19 PM10/17/13
to json-...@googlegroups.com
Just so I'm on record on all of these in one place:

General features:
  • "$data" - +1
  • Multi-lingual meta-data - -0. Metadata keywords confuse me and I feel like they clutter the spec and confuse users sometimes.
  • "enumNames", "formatMinimum", "formatMaximum" - -+-0, I find it really unfortunate how we have this proliferation of not-quite-namespaces rather than structured things, but I don't have a better suggestion and this seems to be the direction we're already going in so...
  • Ban unknown properties mode - indifferent. I also wouldn't use this.

Validation/structural features:

  • Alternative string values for type - unsurprisingly, +1
  • "switch" - undecided. Need to look at this more carefully to see how much I like it. I'm cautiously weary.
  • "patternGroups" - -0 agreed about this having limited use
  • "contains" - +1
  • Re-introduction of some v3 format valuesAdditional format values - +1
  • "unordered" - -0, again due to my weariness about non-validating properties, but also because I perhaps would do this by writing "type": "set" along with the above

Cheers,

Reda B

unread,
Oct 18, 2013, 7:34:17 AM10/18/13
to Geraint, json-...@googlegroups.com
Ok I understand your concern. Thanks for the insight.

Eric G

unread,
Oct 18, 2013, 8:18:39 PM10/18/13
to json-...@googlegroups.com, reda.b...@gmail.com
This is a very helpful discussion, I thought I'd give it its own thread rather than hijack the new features thread further... :)

I'd very much like to see your thoughts on standards for hyper-validator behavior.  Of course, validation is only one thing you can do with links. But perhaps it's at least one area that could be standardized -- in particular related to "describedBy" and "self" links.

For what it's worth, in the json-schema-suite tools, I left things mostly "do it yourself" for now. The "primitive" operations are basically: 
1) dereference: given any schema object, dereference it (including any remote references); 
2) get schema: fetch and dereference (and cache) a schema at a given link; 
3) follow: follow a given link + method, and fetch the response data together with any dereferenced schema(s) (correlated through http headers).  

Once you have a correlation of schema and instance, you can follow links in it, get a schema at a given link (e.g. "describedBy"), and also follow a local "fragment" link (e.g. "root") to get a new correlation (ie. sub-schema for the root of the instance), and follow links from there, etc.  It's very open-ended at the moment*, has no special behavior attached to particular link rels or media-types, basically only considers the href and method in following links** . 

I don't know if this makes any sense. But it would be relatively easy to write up an example of hyper-validation as you describe it in relation to Reda's case, using these primitive operations. I'll try to do that. Once some typical behaviors are nailed down I'd be happy to add a simpler interface on top.

Eric


* In fact, I think it's a little too open-ended at the moment and may take out some of the features that let you sneak in out-of-band conventions.

** It also validates outgoing data vs the schema and incoming data vs the targetSchema if these are specified

Geraint

unread,
Oct 19, 2013, 9:11:20 AM10/19/13
to json-...@googlegroups.com, reda.b...@gmail.com
Eric: this showed up as a new thread in my inbox, but not in the Google Groups web interface.

I've quoted your email in a new thread (and responded) - I hope that's OK.

Geraint

frontend_dev

unread,
Oct 19, 2013, 1:50:56 PM10/19/13
to json-...@googlegroups.com
  • "$data" - I really like this.  It's flexible and powerful, and can be fairly straightforward to implement.

Still think that this is too restrictive. Maybe too late, but I would simply do it like this:

{
     "properties": {

           "prop1": { ... whatever ...},
           "prop2": {
                    "$data": "path/to/prop1";
                    "type": "number"
           }
}


All that $data does is to "steal" the value specified with the path as you do with your proposal, and uses this value instead of the implicitly derived value. But it would allow for anything, no restrictions.

So, in this case "prop2" would be valid if "prop1" is a number. Of course you could extend that with "allOf" (and also "switch"):

{
     "properties": {

           "prop1": { ... whatever ... },
           "prop2": {
                    "allOf": [
                           {

                             "$data": "path/to/prop1";
                             "value": 1
                           },
  
                         {
                             "type": "string"
                           },
  
                    ]
           }
}


In this case "prop2" woudl be valid if it s string, and if "prop1" has the value "1". (Also included the "value" proposal here). So this seems to be quite more flexible to me.

And yes, I'd still like to have "value" as a more explicit shortcut to enum["single value"].

  • "enumNames" - I like this, especially when combined with the multi-lingual meta-data.
Still I think this is weird, and not very flexible. What if I need an additional description. I think I might even ditch usage of "enum" completely in favor of "anyOf". This addition does not make it much better imho.

Geraint

unread,
Oct 20, 2013, 5:22:03 AM10/20/13
to json-...@googlegroups.com
On Saturday, October 19, 2013 6:50:56 PM UTC+1, frontend_dev wrote:
So, in this case "prop2" would be valid if "prop1" is a number.

I think this is a good illustration of why I find it weird this way around.

If a schema says "prop2" is invalid, then that (to my mind) should be something that I can fix by changing "prop2".  However, in this example, changing the supposedly invalid data ("prop2") has no effect whatsoever.  In order to make "prop2" valid, I have to change "prop1".

(If I think this is strange, you can imagine how I feel about the idea of referencing external URIs - you could end up with a document that was invalid, but which you could only 'fix' by changing a completely separate document which you may or may not even have control over!)

If I was actually explaining to a human what the problem was, I would say "If prop1 is defined, then prop2 must be a number".  So this particular constraint could be re-phrased as:
{
    "switch": [{
        "if": {
            "required": ["prop1"]
        },
        "then": {
            "properties": {
                "prop2: {"type": "number"}
            }
        }
    }]
}
This re-phrasing can happen in general as long as the two interrelated instances are within the same document.
  • "enumNames" - I like this, especially when combined with the multi-lingual meta-data.
Still I think this is weird, and not very flexible. What if I need an additional description. I think I might even ditch usage of "enum" completely in favor of "anyOf". This addition does not make it much better imho.

Replacing with "oneOf" is a good option.  My concern is simply that it seems slightly more convoluted to me.  Instead of "There are a fixed number of possible values for this instance", you end up saying "This instance could follow one of these five separate things - and if you look closely at each of them, then you find that each of them only allows one constant value".  An "enum" is a familiar concept, so there is an advantage in using it in these cases.

However, I am now leaning less towards "enumNames".

frontend_dev

unread,
Oct 20, 2013, 10:54:53 AM10/20/13
to json-...@googlegroups.com
If a schema says "prop2" is invalid, then that (to my mind) should be something that I can fix by changing "prop2".  

Why? How can you "fix" that by only looking at "prop2", if the validility of a property is actually depending on something else?

That is a real requirement we have over and over, it is just the way it is.
 

However, in this example, changing the supposedly invalid data ("prop2") has no effect whatsoever.  In order to make "prop2" valid, I have to change "prop1".

Yes, of course, that's the way how it is simply defined in the data model. How is that weird?
 

(If I think this is strange, you can imagine how I feel about the idea of referencing external URIs - you could end up with a document that was invalid, but which you could only 'fix' by changing a completely separate document which you may or may not even have control over!)

No, that depends on the situation. So let's make up an (well rather artifical) example where a user could choose some clothes. So, the choosing of clothes is dependent on the outside temperature, which we get from an external source. Heres the schema:

{
  "properties": {
    "clothing": { 

      "type": "string",
      "anyOf": [

        {
           "allOf": [
             {
               "$data": "http://path/to/external/temperature";
               "minimum": 12
             },
             { "value": "tshirt" }  
           ]
         },
         {
           "allOf": [
             {
               "$data": "http://path/to/external/temperature";
               "maximum": 15
             },
             { "value": "fur coat" }  
           ]
        }
      ]
    }
  }
}
 
So, that would mean that the user can select a T-Shirt when the outside temp is at a minimum of 12 degrees, and can also choose to carry a fur coat if the temp is below 15 degrees, so for a temp of 13 degrees actually both choices would be valid. (of course you could also use "switch" for that, both apporaches should work)

So - since you are not God or doing actual weather manipulation - you can't change the outside temp, right? So, that is just the way it is. Of course, the user can still make his entry valid if he just chooses the appropriate closing. And if wearing T-Shirts under 12 degrees is disallowed, he simply cannot choose a T-Shirt it is that easy. Imho it is all down to the actual requirements.

           
If I was actually explaining to a human what the problem was, I would say "If prop1 is defined, then prop2 must be a number".  So this particular constraint could be re-phrased as:

That might work for that particular example. But how do I access another property that is not a sibling, but rather some levels deeper and for other attributes than required?
 

> An "enum" is a familiar concept, so there is an advantage in using it in these cases.

Yeah I agree in general, but it is too inflexible when needing additional information for the single enum values. And I think adding something like "enumNames" is only a half way solution. So, next we need "enumDescription" and so on. I think "oneOf" might be more verbose, but also much more versatile.

frontend_dev

unread,
Oct 20, 2013, 11:56:24 AM10/20/13
to json-...@googlegroups.com
Aaaand another example:

Let's say I have an app where I can manage my stock portfolio. So, apart from buying and selling I can also set some kind of "stop-loss". Now, setting a stop-loss with a value _above_ the current quote makes not much sense, right? So in this case I would need to pull the stock quote from an external source, and compare that to the users input. So how on earth should be the user to be able to set a _proper_ stop-loss without having access to that external data? Again, this is real life, and just the way it is.

And what I really do not get is how all you guys didn't stumble over this kind of stuff before. Do your real projects really have only data that is completely isolated and not connected to other pieces of data in any way? Cannot imagine that. Think JSON Schema is not meant only to be an rather academic thing, right? So I am still baffled about the abiltiy to reference additional schemas (which can also be from external source) but no proper way to reference data? And now, a rather half baked solution which allows this .... a little bit. Why? I do not get it.

Also, the reservations about implementation et al. I still think my solution is much more flexible / general and also can be explained in just one sentence: "If any schema encounters a "$data" property, evaluate the given path and use the value retrieved from that instead of the implicit value that would be normally used".

That's it! Just so simple. Now, can you also explain your proposal in one sentence? And I asked many times, but I am still waiting for an explanation why this should lead to _any_ problems with implementing. In fact should be quite easier than with your proposal, which is in fact just a modified, and imho very limited version of my own idea!

And as for "enum", I would to this: "An enum can contain either a value or a schema".

And an object is _not_ a value (in my world), it is an object. If you need to describe an object as an enum value, use a schema. That's it.


PS: for all the above: no intention of being offensive, just want to express my honest opinion.

Eric G

unread,
Oct 20, 2013, 9:18:50 PM10/20/13
to json-...@googlegroups.com
frontend_dev,

I really appreciate your concrete examples, they are interesting food for thought about what JSON Schema currently is and isn't useful for. 

It seems there are two basic issues, (1) how to model "variable" conditions that involve instance data (e.g. comparisons between static values and instance data or between two values in the instance data); and (2) whether or not to allow external references (URIs) or only refer to data within a single instance.

On the second question, it's an interesting proposal to open it up to to pull data from anywhere, but fraught with complications. As I'm sure you realize, a URI is not enough information by itself to pull in data. Even if you expanded $data to specify all the REST metadata included in "links", even if you limited resources to http and application/json and JSON Pointer style fragment resolution, etc., etc. you wouldn't adequately specify how to communicate with 99% of web services out there. JSON Schema is one little corner of the world trying to change this (in my mind horrible) situation, but thinking about it realistically, as you say: JSON Schema can't really specify a generic mechanism for pulling data from external sources.  So we are left with (typically) the server doing the manual work of communicating with external services and processing that data to include it in responses to the client -- i.e., in the instance data.

On the first question, my own preference (as I mentioned before) is to model comparisons not in terms of properties being valid or invalid but in terms of the object that has the propert(ies) being compared.  I see them as kind of like "dependencies that involve instance data", if that makes any sense. When you're looking at error messages, something like  "At some/path/of/instance, prop1 is defined but prop2 is not a number"   or   "At some/other/path, prop1 not greater than prop2"   are more useful than "At some/path/of/instance, prop2 is not a number"  or  "At some/other/path, prop1 not greater than 23" .   Geraint's switch syntax above (which does not really deal with $data references, so maybe not a good example)  makes it easier for a validator to generate the former kinds of messages.  Whereas to my mind, modelling the condition in terms of the property (either using absolute paths as you do or relative JSON pointers) makes the validator's work to generate useful error messages more difficult/specialized.  That, plus quite a bit of special-casing about where $data can and can't be used, not all of the implications being fleshed out, plus how it interacts with the new formatMinimum/formatMaximum, make me very nervous about the whole proposal.

My own preference re. $data is to limit its use to templating links, for now, and continue discussing its use in and syntax for instance-data-dependent conditions. But I expect I'm in the minority, as that seems like a very hotly needed feature.

Eric

Geraint

unread,
Oct 21, 2013, 6:59:35 AM10/21/13
to json-...@googlegroups.com
On Sunday, October 20, 2013 3:54:53 PM UTC+1, frontend_dev wrote:
If a schema says "prop2" is invalid, then that (to my mind) should be something that I can fix by changing "prop2".  

Why? How can you "fix" that by only looking at "prop2", if the validility of a property is actually depending on something else?

You can't fix it just by looking at "prop2" - the thing that actually needs changing is "prop1".  And it's for this reason that I think that ending up with a validation error for "prop2" is incorrect.  It is possible to re-structure the constraints such that the validation errors end up on "prop1", which I think is much more helpful.


However, in this example, changing the supposedly invalid data ("prop2") has no effect whatsoever.  In order to make "prop2" valid, I have to change "prop1".

Yes, of course, that's the way how it is simply defined in the data model. How is that weird?
 

There is more than one way to specify the same set of constraints, and the way you express the constraints affects the nature/helpfulness of the validation errors.  I think that the syntax you are referencing is pretty much always going to give you validation errors that point at an unintuitive part of the data (e.g. "part2"), when the part that you ac

If the two concerns are within the same document, it can always be rephrased - as Eric G says, to "model comparisons not in terms of properties being valid or invalid but in terms of the object that has the propert(ies) being compared".

So your syntax is needed to say that "prop2" is valid only when "prop1" is a string.  However, in terms of which documents are actually valid, that is completely equivalent to saying that if "prop2" is defined then "prop1" must be a string - which you can say with existing vocab (though it's even neater with "switch").

So, that would mean that the user can select a T-Shirt when the outside temp is at a minimum of 12 degrees, and can also choose to carry a fur coat if the temp is below 15 degrees, so for a temp of 13 degrees actually both choices would be valid. (of course you could also use "switch" for that, both apporaches should work)

OK, the principle of validation schemas in general is that they sort values into two categories: valid and invalid.

The idea of data that is valid only in winter doesn't really fit into that principle - validation status should not change over time.  I mean, what if you update your data format, and you need to re-run using the new schema to make sure it's all correct?  If it's relying on external data, then it fails half your clothing choices because what you chose last winter is no longer suitable for summer weather.  But the this data was valid at the time it was submitted - so it should be valid now.  JSON data should not go off like food in the fridge if you leave it for too long.

 
So - since you are not God or doing actual weather manipulation - you can't change the outside temp, right? So, that is just the way it is. Of course, the user can still make his entry valid if he just chooses the appropriate closing. And if wearing T-Shirts under 12 degrees is disallowed, he simply cannot choose a T-Shirt it is that easy. Imho it is all down to the actual requirements.

In this case, I'd say you need summer and winter schemas, expressing the differing clothing requirements.  I mean, if your API suddenly changes its requirements, then a schema change is not out of the question.
 

           
If I was actually explaining to a human what the problem was, I would say "If prop1 is defined, then prop2 must be a number".  So this particular constraint could be re-phrased as:

That might work for that particular example. But how do I access another property that is not a sibling, but rather some levels deeper and for other attributes than required?

I'm afraid I don't see the problem - I mean, I can go arbitrarily deep in either my "if" or "then", and I can use any keywords I like in either of them.
 
 
> An "enum" is a familiar concept, so there is an advantage in using it in these cases.

Yeah I agree in general, but it is too inflexible when needing additional information for the single enum values. And I think adding something like "enumNames" is only a half way solution. So, next we need "enumDescription" and so on. I think "oneOf" might be more verbose, but also much more versatile.

 
Yeah, I'm agreeing with this more now.  I might go and dig up an old thread to get the other side of this one, but "oneOf" is a reasonable solution that is both complete and doesn't require new keywords.

Geraint

unread,
Oct 21, 2013, 7:22:35 AM10/21/13
to json-...@googlegroups.com
On Sunday, October 20, 2013 4:56:24 PM UTC+1, frontend_dev wrote:
Now, can you also explain your proposal in one sentence?

"If certain schema keywords contain a "$data" property, evaluate the given path and use the value retrieved from that instead of the schema value that would be normally used"

I think it might be good to clear up - I actually think our two "$data" proposals are separate and incompatible.  Mine was (as Eric G notes) based in templating links (stealing things like link titles from the data).  The application to schemas was a generalisation of that idea.

Your proposal doesn't cover the link-templating case, and if your constraints are all within the same document, I'm still unclear what feature it actually adds - it seems like an alternative syntax to express things which can already be expressed.

Both these ideas used the "$data" keyword, but they are using it for different things.  It is unfortunate that I'd forgotten about your suggested use of the "$data" keyword, and didn't clearly separate them.


On Monday, October 21, 2013 2:18:50 AM UTC+1, Eric G wrote:
My own preference re. $data is to limit its use to templating links, for now, and continue discussing its use in and syntax for instance-data-dependent conditions. But I expect I'm in the minority, as that seems like a very hotly needed feature.

The problem is that frontend_dev's "$data" syntax doesn't cover the link-templating use-case at all, and it actually kind of conflicts with the syntax/use that link-templating would use.  (Basically, whether we swap the stolen/referenced data into the data-side or schema-side).

Whereas I see my "$data" proposal as an extension of the link-templating case to the more general schema situation.  So I think that just accepting the link-templating stuff is not a neutral action.

frontend_dev

unread,
Oct 21, 2013, 9:40:09 AM10/21/13
to json-...@googlegroups.com
It seems there are two basic issues, (1) how to model "variable" conditions that involve instance data (e.g. comparisons between static values and instance data or between two values in the instance data);

Exactly, that is the main issue.
 
and (2) whether or not to allow external references (URIs) or only refer to data within a single instance.

Accessing external sources would be a mere extension to (1) with my solution without changes in syntax. This is also one thing I like about my proposal.
 

On the second question, it's an interesting proposal to open it up to to pull data from anywhere, but fraught with complications. As I'm sure you realize, a URI is not enough information by itself to pull in data. Even if you expanded $data to specify all the REST metadata included in "links", even if you limited resources to http and application/json and JSON Pointer style fragment resolution, etc., etc. you wouldn't adequately specify how to communicate with 99% of web services out there.

Well, as you said we could provide "$data" with additional properties, so if it is just a string it is a mere JSON pointer, while if it was an object, we could add additional params like "method" etc. But in most cases I think this will be simply a GET with "application/json" mime-type. Of course you should bve able to use fragement resolution to extract a single value from the returned external JSON.
 
JSON Schema is one little corner of the world trying to change this (in my mind horrible) situation, but thinking about it realistically, as you say: JSON Schema can't really specify a generic mechanism for pulling data from external sources.

And why should it need that? Does ist has a generic mechanism for pulling external schemas?
 
If I specify an URI/URL, with method and mime type, I think that will already cover a lot.

 So we are left with (typically) the server doing the manual work of communicating with external services and processing that data to include it in responses to the client -- i.e., in the instance data.

The Server? In my case, I am talking about the client! That solution won't necessarily need a server. But of course, somebody has to communicate, but that would be then "just the way it is" again.
 
Geraint's switch syntax above (which does not really deal with $data references, so maybe not a good example)  makes it easier for a validator to generate the former kinds of messages.

Yes, but it isn't stricitly necessary. So I have written my examples with "oneOf" etc, but of course you should also be able to do it with "switch".
 
 Whereas to my mind, modelling the condition in terms of the property (either using absolute paths as you do or relative JSON pointers) makes the validator's work to generate useful error messages more difficult/specialized.  That, plus quite a bit of special-casing about where $data can and can't be used, not all of the implications being fleshed out, plus how it interacts with the new formatMinimum/formatMaximum, make me very nervous about the whole proposal.

Well, I already have shown how it could work with "minimum", thus not needing something like "minimum": {"$data": "1/smaller"}

Where would you see problems with formatMinimum/formatMaximum?

My own preference re. $data is to limit its use to templating links, for now, and continue discussing its use in and syntax for instance-data-dependent conditions. But I expect I'm in the minority, as that seems like a very hotly needed feature.

Still do not see why exactly we should limit it ;) 

frontend_dev

unread,
Oct 21, 2013, 9:59:23 AM10/21/13
to json-...@googlegroups.com
You can't fix it just by looking at "prop2" - the thing that actually needs changing is "prop1".  And it's for this reason that I think that ending up with a validation error for "prop2" is incorrect.  It is possible to re-structure the constraints such that the validation errors end up on "prop1", which I think is much more helpful.

Sorry, but I must disagree ;)

So take my stock market example: I have some AAPL stock, currently trading at about 509$. Let's say I want to edit a stop loss, and I set the value to 510$. Of course that does make no sense and the user should be really warned about that. But still, the error occurs on "prop2" in this case, it is still the users issue, and NOT the fault of the actual stock quote. The same goes for the clothing example, based on outside temperature.

Of course in both cases there should be an error on "prop1", not the dependant source. The dependand source cannot be "guilty" by definition imho. Seems very clear to me.
 
There is more than one way to specify the same set of constraints, and the way you express the constraints affects the nature/helpfulness of the validation errors.  I think that the syntax you are referencing is pretty much always going to give you validation errors that point at an unintuitive part of the data (e.g. "part2"), when the part that you ac

As I said, an error on "part2" would be absolutely OK. Why not? An error shown with the dependant value would be plain wrong in my eyes.
 
If the two concerns are within the same document, it can always be rephrased - as Eric G says, to "model comparisons not in terms of properties being valid or invalid but in terms of the object that has the propert(ies) being compared".

OK, so how would you do it with the following data:

{
  "obj1" : {
     "mainProp" : {
       "subProp": {
         "subSubDependant": true
       }
    }
  },
  "obj2" : {
    "mainProp" : {
      "subProp": {
        "subSubProp": ...
      }
    }
  }
}

 
Now, the value "obj2/mainProp/subProp/subSubProp" is dependant on "obj1/mainProp/subProp/subSubDependant".

How would you express that with the current schema, and WITHOUT evaluating everything from "root". We already discussed that, and that would lead to constructs that are very convoluted, and very hard to parse as well. I want to express the dependency on the property level, and not somewhere else. (similar to the issue I have with "required", but in many cases, much worse)

  
So your syntax is needed to say that "prop2" is valid only when "prop1" is a string.  However, in terms of which documents are actually valid, that is completely equivalent to saying that if "prop2" is defined then "prop1" must be a string - which you can say with existing vocab (though it's even neater with "switch").
 
In this case, but not neccesarily in others.
 

The idea of data that is valid only in winter

No it is valid beased on temperature, not season. And that is only _one_ example.
 

doesn't really fit into that principle - validation status should not change over time. 

Ah, and why not? Hey, do not ignore reality ;) What about the stock market example? There is surely some data which requirements would change over time. Again, it is just the way it is, and JSON schema should respect that.
 

I mean, what if you update your data format, and you need to re-run using the new schema to make sure it's all correct?  If it's relying on external data, then it fails half your clothing choices because what you chose last winter is no longer suitable for summer weather.  But the this data was valid at the time it was submitted - so it should be valid now. 

Yeah, bc back that day, I choose the fur. But now it is not the same day. Do you have the same clothes on, all the year, every day?

Sorry, I do not get your argument ;)
 

JSON data should not go off like food in the fridge if you leave it for too long.

?
 
In this case, I'd say you need summer and winter schemas, expressing the differing clothing requirements.  I mean, if your API suddenly changes its requirements, then a schema change is not out of the question.

You can use subschemas, right, but it has nothing to do with "summer" or "winter"
 

I'm afraid I don't see the problem - I mean, I can go arbitrarily deep in either my "if" or "then", and I can use any keywords I like in either of them.

But right now, only with loads of additional, awkward and hard to read, as well as VERY hard to parse syntax. Again, how would it look like with the example from just above?
 

Yeah, I'm agreeing with this more now.  I might go and dig up an old thread to get the other side of this one, but "oneOf" is a reasonable solution that is both complete and doesn't require new keywords.

I think: either ditch "enum" completely, or modifiy it like I proposed.

Geraint

unread,
Oct 21, 2013, 10:02:12 AM10/21/13
to json-...@googlegroups.com
On Monday, October 21, 2013 2:40:09 PM UTC+1, frontend_dev wrote:

Well, I already have shown how it could work with "minimum", thus not needing something like "minimum": {"$data": "1/smaller"}

I'm sorry, I've somehow lost track of the different threads where we were discussing this.

Could you re-post your example of how you would specify "smaller < larger"?  I'm afraid I can't quite see how that would work using just "$data" - did it require other new keywords as well?

frontend_dev

unread,
Oct 21, 2013, 11:05:53 AM10/21/13
to json-...@googlegroups.com
yes, I meant the example I posted just above, this one:



{
  "properties": {
    "clothing": { 

      "type": "string",
      "anyOf": [

        {
           "allOf": [
             {
               "$data": "http://path/to/external/temperature",

               "minimum": 12
             },
             { "value": "tshirt" }  
           ]
         },
         {
           "allOf": [
             {
               "$data": "http://path/to/external/temperature";
               "maximum": 15
             },
             { "value": "fur coat" }  
           ]
        }
      ]
    }
  }
}


So this should do the same as with your proposal, or am I missing something?

And no, no additional keywords required, also applies for _all_ schema attributes, not only "minimum". That's why I like it.

frontend_dev

unread,
Oct 21, 2013, 11:10:08 AM10/21/13
to json-...@googlegroups.com
"If certain schema keywords contain a "$data" property, evaluate the given path and use the value retrieved from that instead of the schema value that would be normally used"

Yeah, but this is not complete: what is "certain"? That sounds pretty ambiguous. You would also have to explain that, and at least your sentence will be much longer then, right? ;)
 
I think it might be good to clear up - I actually think our two "$data" proposals are separate and incompatible.

I am not sure. At least with your "minimum" example, it also seems to work with my proposal. So, if I haven't overlooked something, this should include _all_ of your own proposals.
 
Your proposal doesn't cover the link-templating case, and if your constraints are all within the same document, I'm still unclear what feature it actually adds - it seems like an alternative syntax to express things which can already be expressed.

I admit I haven't looked at link templating in conjunction with $data yet. Can you give a concrete example where you would see any problems?

Geraint

unread,
Oct 21, 2013, 11:12:04 AM10/21/13
to json-...@googlegroups.com
On Monday, October 21, 2013 4:05:53 PM UTC+1, frontend_dev wrote:
yes, I meant the example I posted just above, this one:

And no, no additional keywords required, also applies for _all_ schema attributes, not only "minimum". That's why I like it.


Ah, sorry - you said: "thus not needing something like "minimum": {"$data": "1/smaller"}"

I thought you were referring to the example from the wiki page which uses that exact syntax, demonstrating a constraint where the "larger" property has to be larger than the "smaller" property.  This is a constraint I have seen a few questions about, and is currently impossible to specify.  I thought you were claiming that your syntax could also express this constraint, and I wasn't sure how that worked.

Geraint

unread,
Oct 21, 2013, 11:18:35 AM10/21/13
to json-...@googlegroups.com
On Monday, October 21, 2013 4:10:08 PM UTC+1, frontend_dev wrote:

I am not sure. At least with your "minimum" example, it also seems to work with my proposal. So, if I haven't overlooked something, this should include _all_ of your own proposals.

Ah!  So, does the behaviour of your "$data" vary depending on where you use it?

So if you use it inside "minimum", then it extracts the value and then uses it as the minimum-value-constraint for the actual data (in this case, "larger"?), but if you use it inside a schema, then it extracts the value and uses it as the instance for validation?

I'll admit to being a little confused here.

frontend_dev

unread,
Oct 21, 2013, 12:26:10 PM10/21/13
to json-...@googlegroups.com

Ah!  So, does the behaviour of your "$data" vary depending on where you use it?

All that $data does with my proposal is to replace the implicitly derived value with the one specified wtith "$data".
 

So if you use it inside "minimum", then it extracts the value and then uses it as the minimum-value-constraint for the actual data (in this case, "larger"?), but if you use it inside a schema, then it extracts the value and uses it as the instance for validation?

Well - if I get it right - the diference is this:

You say: "The minimum of value B should be greater than the _minimum_ (defined in it's schema) for value A"
while I say: "The minimum of value B should be greater than the (current!) _value_ of  A"

And I think the latter one is really what I would care about in such a case. The advantage I see that the latter approach can use _all_ of the schema features, while with your proposal it is just limited to _some_ Plus, I do not have to care about the schema of value A, since it is actually the _value_ I care about.

And in addition, I still see that even both of these uses for "$data" could be possible. But after all what at least _I_ need is to express relationships between data, not it's schemas. Of course the latter might still be usefull _in addition_. So for example, like this:

"allOf": [
  {
     "$data": "path/to/value1";
     "value": true
  },
  {
    "minimum": { "$data": "path/to/value2/smaller" },
  } 
]


Yeah, so why not? In this example, the property would be valid if a) "value1" is "true", and the minimum is smaller than the minimum defined in the schema for "value2".

Hope the difference is more clear now, but as you see, the meaning of "$data" would not change in essence, just the context where it is used.

Geraint

unread,
Oct 21, 2013, 1:03:36 PM10/21/13
to json-...@googlegroups.com
On Monday, October 21, 2013 5:26:10 PM UTC+1, frontend_dev wrote:

Ah!  So, does the behaviour of your "$data" vary depending on where you use it?

All that $data does with my proposal is to replace the implicitly derived value with the one specified wtith "$data".

The derived value of the data, or from the schema/keyword?
 
So if you use it inside "minimum", then it extracts the value and then uses it as the minimum-value-constraint for the actual data (in this case, "larger"?), but if you use it inside a schema, then it extracts the value and uses it as the instance for validation?

Well - if I get it right - the diference is this:

You say: "The minimum of value B should be greater than the _minimum_ (defined in it's schema) for value A"
while I say: "The minimum of value B should be greater than the (current!) _value_ of  A"

OK, we're evidently at cross-purposes here, because I'm saying: "The minimum of B should be {the value of A}".

We are obviously misunderstanding each other.  I thought that you were saying that "$data" altered which instance was being validated, but that the rest of the schema was the same - I'm completely unclear how that can even work when used inside the "minimum" property.
 

And I think the latter one is really what I would care about in such a case. The advantage I see that the latter approach can use _all_ of the schema features, while with your proposal it is just limited to _some_ Plus, I do not have to care about the schema of value A, since it is actually the _value_ I care about.

Yeah, I'm not trying to take values from the schema - I don't know how that got mixed-up.  I am extracting the value of A, and I am substituting that value into the schema for B.  So:

Data: {"A": 5, "B": 10}
Schema: {
    "properties": {
        "B": {
            "minimum": {"$data": "1/A"}
        }
    }
}

Schema (after substitution): {
    "properties": {
        "B": {
            "minimum": 5
        }
    }
}


The reason this has to be limited to some keywords is schemas like:
{
    "properties": {
        "$data": ...
    }
}

This is specifying a schema for a data property called "$data" - and we don't want the validator to end up substituting the value of "properties" in the schema.  Does that make sense?
 
And in addition, I still see that even both of these uses for "$data" could be possible. But after all what at least _I_ need is to express relationships between data, not it's schemas. Of course the latter might still be usefull _in addition_. So for example, like this:

"allOf": [
  {
     "$data": "path/to/value1";
     "value": true
  },
  {
    "minimum": { "$data": "path/to/value2/smaller" },
  } 
]


Yeah, so why not? In this example, the property would be valid if a) "value1" is "true", and the minimum is smaller than the minimum defined in the schema for "value2".

Hope the difference is more clear now, but as you see, the meaning of "$data" would not change in essence, just the context where it is used.

In that example just there, "$data" seems to me like it does two separate things.  When used inside "minimum", a value from the data is plucked-out and substituted, so the second schema looks like:
{
    "minimum": <value of "path/to/value2/smaller>
}
However, if that substitution happened the same way for the first sub-schema, then I'd expect us to end up with something like:
"allOf": [
  <value of "path/to/value1">,
  {
    "minimum": <value of "path/to/value2/smaller">
  } 
]
(This is actually exactly what would happen with my "$data" proposal - and I think it's awesome!)

This, however, seems to not be the behaviour you are expecting - so when "$data" is used in a schema, then the value is not substituted in place of the "$data"-object, but is used for some other purpose.  It seems to me like this other purpose is not substituting into the schema at all, but instead using it as the replacement instance for validation, which I think is completely different.

frontend_dev

unread,
Oct 21, 2013, 2:03:07 PM10/21/13
to json-...@googlegroups.com
The derived value of the data, or from the schema/keyword?

Well, data of course - after all the prop is called $data with a reason ;)
 
OK, we're evidently at cross-purposes here, because I'm saying: "The minimum of B should be {the value of A}".

That's what I am saying, and I can express that easily with my proposal, while being not limited. Just see the example above.
 
We are obviously misunderstanding each other.  I thought that you were saying that "$data" altered which instance was being validated, but that the rest of the schema was the same

Well, the "rest" can be what might be appropriate, including subschemas evaluated by "switch" "oneOf" etc.
 
- I'm completely unclear how that can even work when used inside the "minimum" property.

Not inside, but - in my proposal - rather outside, again see my example. Don't see why it shouldn't work.
 
Yeah, I'm not trying to take values from the schema - I don't know how that got mixed-up.  I am extracting the value of A, and I am substituting that value into the schema for B.  So:

In essence, so do I.
 
The reason this has to be limited to some keywords is schemas like:
{
    "properties": {
        "$data": ...
    }
}


Why? Don't see a problem (yet). In this particular case, just write:

"properties": {
        "$data": {
            "$data": { ... }
        }
    }

 
This is specifying a schema for a data property called "$data" - and we don't want the validator to end up substituting the value of "properties" in the schema.  Does that make sense?

Yeah, but that's just not the case? Or am I missing something?
Reply all
Reply to author
Forward
0 new messages