Re-use in JSON Schema: a new approach for cases not currently handled

492 views
Skip to first unread message

Henry Andrews

unread,
Sep 12, 2016, 1:07:10 AM9/12/16
to JSON Schema

Hi folks,

  Having dug through all of the issues and wiki pages, and many old email threads, I’m confident in saying that questions of schema re-use and extension are the thorniest problems facing the JSON Schema project.  Even two years ago, when I last worked with JSON Schema, there were already two radically different approaches that were completely deadlocked.  They appear to have remained deadlocked up to the point where the proponents of each more or less stepped back from the project.


  While that conflict seems intractable, I think the real concern is that the problem(s) folks were trying to solve were never clearly articulated.  The use cases that are thrown around in those arguments (and many issues and email threads end up right back at the same arguments) are muddled or hypothetical, leading much of the argument to be about asserting that JSON Schema users shouldn’t even want these features.  


  I believe that there are several clear use cases, some of which are undesirable, and some of which are desirable.  Sorting those out allows us to take a fresh approach to the problem area and to potential solutions.


  I apologize for the length of this email- I have done my best to break it up, but this is, as far as I can tell, the most controversial and complicated area of discussion and it is not easily addressed.  With that in mind….


——————


Here are the use cases that I see:


1.  I want to prevent other schema authors from extending this object schema with additional properties.


2.  I want to strictly validate the set of property names in any instance of this object schema


3.  I want to extend a schema by adding more properties to an object schema

3a.  Neither the base nor the derived schema need to set additionalProperties: false

3b.  The desired “base” has additionalProperties: false as the result of either of the above use cases.


4.  I want to extend a schema by further tightening its restrictions

4a.  I want to do things like make more fields required, add or tighten min/max constraints, or otherwise layer on schema features that validate correctly independently

4b.  I want to take a schema that allows additionalProperties and disallow them


5.  I want extend a base schema and override aspects of its validation, e.g. by replacing the schema for a particular object member with a totally different one.


6.  I want to customize the annotations of this re-usable schema (of any type) at each point of use


——————


#1, 2, 3a, and 4a are each handled just fine on their own by additionalProperties (1, 2) or allOf (3a, 4a).  While some argue that #2 should not be valid, many have articulated clear situations where fail-fast is more valuable than flexibility.


#3b and 4b are not handled, and we’ll come back to them.


#5 is arguably invalid.  The correct solution is to refactor the schemas into a common shared base plus two divergent derived schemas, reducing the problem to use case 3 and/or 4.


#6 is not handled in draft 4, but does not impact validation at all- it is purely a description and documentation concern.  There is no fundamental type validation extension or modification with #6.


——————


And now for the more complex cases:


The combination of #1 and 3b is invalid.  Either no aspect of the closed type should be referenced at all, or (as with #5), a shared extensible base should be factored out.  This would reduce the use case back to #3a, 4a, and/or #4b.  Recreating the original “base” that is only different by being closed to further extension is use case 4b, which is not handled.


The combination of #2 and #3b is valid but not handled.  There is no way in v4 to specify whether the schema author was implementing use case #1 vs #2, so the mechanism that (properly) prevents 1+3b also (unfortunately) prevents 2+3b.  Requirements for both strict validation and extensibility are not in opposition:  The combination is fundamental to strongly typed OO languages such as C++ and Java.


Finally, #4b is not handled because attempting to add additionalProperties: false to an object schema using allOf produces an impossible schema, as the additionalProperties: false branch must not be required to know what properties are defined elsewhere in the allOf, and therefore thinks that no properties are defined.


——————


So.  Assuming we agree that they are within the scope of JSON Schema, we need to address use cases #2+3b, 4b, and 6.


#2+3b:  First, we need to make it possible to distinguish between use cases #1 and #2.    Currently we do not have inheritance (allOf is more of an intersection operator).  So I would say that “additionalProperties: false” on its own should indicate #2:  the desire for strict validation of property names.  That is it’s most clear current effect, and many, many JSON Schema users have indicated that that is why they use “additionalProperties: false”.


To indicate use case #1, we could add a boolean keyword such as “inheritable” which is true by default (this goes with {} being the most permissive possible schema).


Now, use case #3b would be handled by introducing “inherits” (I’m avoiding “extends” as it was in v3).  The implementation of “inherits” would be similar to allOf except that additonalProperties: false, whether it appears on the base or derived object schema, would apply to the union of properties at that object level.


This would also support use case #4b, as inheriting and object schema with just {“additionalProperties”: false} would produce the desired effect of locking down the property names without otherwise changing the base.


There would be some subtleties to work out if we want “inherits” to have special (defined as ‘unlike “allOf”’) behavior for schema elements other than object properties.  This also produces a situation where either “inherits” or “allOf” could work for certain cases, and we may want to provide some guidance on that.  It needs a bit more thought.


——————


I should address one of the major proposals here, “ban additional properties mode”, which changes the behavior of validation for the entire schema.  In addition to making it impossible to tell how the validation will behave solely from the schema, this approach largely inverts the meaning of {} from the most permissive schema to a schema that will not validate any object.  Although it may validate other types successfully.


Even aside from those problems, “ban additional properties mode” is essentially a debug mode, per the author’s description of enabling it during development and disabling it in production.  This means it does not address #2+3b or 4b at all.  Prioritizing fail-fast over flexibility is a common tradeoff and must be supported in production, not just during development.


Finally, “ban additional properties mode” doesn’t do anything at all for use case #6.


——————


Speaking of use case #6, that is the last case we need to address.  The $merge/$patch pre-processor proposal *could* handle this.  $merge/$patch in general could produce an arbitrarily transformed set of validation rules.  However, use case #6 does not involve validation fields so the validation behavior would not be affected.  We would either have to specify or strongly recommend that $merge/$patch not be used to modify non-annotation fields.


The other $merge/$patch problem is addressing the resulting schema.  Since use case #6 does not affect validation, from the validator’s point of view a $merge is no different than a $ref to the source schema, and should be handled the same way for addressing when needed.


Alternatively, to avoid having to attempt to restrict a general JSON-editing technique to specific fields, we could add a keyword such as “usage” or “annotations”.  This would be an object that only contained annotation fields, with the behavior that any fields in “usage” overwrite the same fields in $ref’d, allOf, anyOf, oneOf, etc. schemas at the same level.


This also does not change the validation behavior at all, and avoids adding more rules of addressing across preprocessors.  I confess I somewhat prefer this idea, as it is less open to abuse.  It also allows anyone to still use the $merge/$patch ideas across the JSON world, including with JSON Schema if they desire.  It would just not be a proper JSON Schema until after the pre-processing.  Kind of like how you can do really crazy tricks with the C pre-processor if you really want to.  But you generally shouldn’t.


——————


I feel fairly confident that this set of use cases, or something very much like it, is key to resolving this problematic area.  I feel that they clarify the problems with both of the major proposals and allow us to focus on specifically targeting solutions to situations that we consider to be valid forms of re-use.  I am not as strongly attached to my specific solutions, although I think they make a good starting point.


I am looking forward to any and all feedback, positive or negative :-)


thanks,

-henry

Henry Andrews

unread,
Sep 12, 2016, 2:13:09 AM9/12/16
to JSON Schema
This proposal, which I saw at some point and then forgot about it, is not unlike the "inherits" approach, and it covers a lot more cases: https://github.com/json-schema/json-schema/wiki/Schema-merging

[btw, for some reason my original message is showing up weirdly double-spaced in the Google Groups web UI- sorry about that]

cheers,
-henry

Austin William Wright

unread,
Sep 13, 2016, 10:38:00 AM9/13/16
to JSON Schema
This is so awesome.

I was planning on writing up an analysis like this in order to study how JSON APIs use features similar to OO subclassing (and how different languages handle it), but you seem to have gotten everything I would be able to think of off-the-bat.

Idk when we'll be able to use this, but certainly for the next release that adds new features.

Thanks,

Austin.


On Sunday, September 11, 2016 at 10:07:10 PM UTC-7, Henry Andrews wrote:

Henry Andrews

unread,
Sep 13, 2016, 11:13:37 PM9/13/16
to JSON Schema
Great!  I'm really glad this is useful :-)

I think it would make sense for me to break this up into a few v6 proposals and file them.  Given the contentious history, it would be good to work past at least the first round of objections while we're mostly focusing on v5.  Maybe something like:

* "inherits"/"inheritable" (taking into account at least some of the ideas from the "includes" proposal, although that is not exactly what I am going for)

* "using" (or similar) for overwriting annotations at point-of-use

plus adding comments about them to the existing (or about to be ported to the new repo) $merge/$patch and ban-strict-properties mode proposals.  Either of these two new proposals could be accepted or rejected independent of the other.

Is there somewhere good to record suggested use cases?  I don't want to rewrite this entire long thing in each issue.  I could see adding a page for this on the new web site- just point me to where and I'll make a pull request.  Or create a wiki page in the new spec repo?  Whichever, just let me know.

thanks,
-henry

Henry Andrews

unread,
Sep 14, 2016, 12:00:24 AM9/14/16
to JSON Schema
Random thought the choice of "inherits":  using inheritance terminology is somewhat perilous as JSON Schema is not an object-oriented programming language, and the conceptual parallels are limited.

Another possible term would be "expands".  The difference between this and either the current "allOf" or the old "extends" in draft 03 is that it *adds* possible properties to an otherwise closed set.  The "inheritable" keyword would obviously also have to change, probably to "expandable".  Note that "inheritable"/"expandable": false has no meaning unless "additionalProperties" is also false.

Any opinions on this name choice?

thanks,
-henry

ham...@gmail.com

unread,
Aug 18, 2017, 9:52:13 AM8/18/17
to JSON Schema

I'm a newcomer to json-schema who has just hit the `anyOf+additionalProperties` problem.

There's a page in the documentation (https://spacetelescope.github.io/understanding-json-schema/reference/combining.html#allof) which states "This shortcoming is perhaps one of the biggest surprises of the combining operations in JSON schema: it does not behave like inheritance in an object-oriented language. There are some proposals to address this in the next version of the JSON schema specification."

I've searched the list and found various past discussions in this area but I haven't found a clear path forward. Is there any consensus yet?

To my mind, this post by Henry Andrews cuts to the heart of the matter:

On Monday, 12 September 2016 06:07:10 UTC+1, Henry Andrews wrote:
 

First, we need to make it possible to distinguish between use cases #1 and #2.    Currently we do not have inheritance (allOf is more of an intersection operator).  So I would say that “additionalProperties: false” on its own should indicate #2:  the desire for strict validation of property names.  That is it’s most clear current effect, and many, many JSON Schema users have indicated that that is why they use “additionalProperties: false”.

To indicate use case #1, we could add a boolean keyword such as “inheritable” which is true by default (this goes with {} being the most permissive possible schema).


I would very much like to use `additionalProperties` for case #2 (which is "2.  I want to strictly validate the set of property names in any instance of this object schema" as opposed to "1.  I want to prevent other schema authors from extending this object schema with additional properties.")

Am I missing something in v6 which addresses this same use case?

Thanks,
Hamish

Henry Andrews

unread,
Aug 18, 2017, 4:13:12 PM8/18/17
to JSON Schema
The closest thing in draft-06 is using "propertyNames" with "enum", which I explain in detail here:

If you want to look through all of the relevant further proposals, there is a label for it:

I hope to resolve this in the forthcoming draft.  I doubt there is a solution that will make everyone happy, but it's time to put a stake in the ground on this.

thanks,
-henry
Reply all
Reply to author
Forward
0 new messages