Hi folks,
Having dug through all of the issues and wiki pages, and many old email threads, I’m confident in saying that questions of schema re-use and extension are the thorniest problems facing the JSON Schema project. Even two years ago, when I last worked with JSON Schema, there were already two radically different approaches that were completely deadlocked. They appear to have remained deadlocked up to the point where the proponents of each more or less stepped back from the project.
While that conflict seems intractable, I think the real concern is that the problem(s) folks were trying to solve were never clearly articulated. The use cases that are thrown around in those arguments (and many issues and email threads end up right back at the same arguments) are muddled or hypothetical, leading much of the argument to be about asserting that JSON Schema users shouldn’t even want these features.
I believe that there are several clear use cases, some of which are undesirable, and some of which are desirable. Sorting those out allows us to take a fresh approach to the problem area and to potential solutions.
I apologize for the length of this email- I have done my best to break it up, but this is, as far as I can tell, the most controversial and complicated area of discussion and it is not easily addressed. With that in mind….
——————
Here are the use cases that I see:
1. I want to prevent other schema authors from extending this object schema with additional properties.
2. I want to strictly validate the set of property names in any instance of this object schema
3. I want to extend a schema by adding more properties to an object schema
3a. Neither the base nor the derived schema need to set additionalProperties: false
3b. The desired “base” has additionalProperties: false as the result of either of the above use cases.
4. I want to extend a schema by further tightening its restrictions
4a. I want to do things like make more fields required, add or tighten min/max constraints, or otherwise layer on schema features that validate correctly independently
4b. I want to take a schema that allows additionalProperties and disallow them
5. I want extend a base schema and override aspects of its validation, e.g. by replacing the schema for a particular object member with a totally different one.
6. I want to customize the annotations of this re-usable schema (of any type) at each point of use
——————
#1, 2, 3a, and 4a are each handled just fine on their own by additionalProperties (1, 2) or allOf (3a, 4a). While some argue that #2 should not be valid, many have articulated clear situations where fail-fast is more valuable than flexibility.
#3b and 4b are not handled, and we’ll come back to them.
#5 is arguably invalid. The correct solution is to refactor the schemas into a common shared base plus two divergent derived schemas, reducing the problem to use case 3 and/or 4.
#6 is not handled in draft 4, but does not impact validation at all- it is purely a description and documentation concern. There is no fundamental type validation extension or modification with #6.
——————
And now for the more complex cases:
The combination of #1 and 3b is invalid. Either no aspect of the closed type should be referenced at all, or (as with #5), a shared extensible base should be factored out. This would reduce the use case back to #3a, 4a, and/or #4b. Recreating the original “base” that is only different by being closed to further extension is use case 4b, which is not handled.
The combination of #2 and #3b is valid but not handled. There is no way in v4 to specify whether the schema author was implementing use case #1 vs #2, so the mechanism that (properly) prevents 1+3b also (unfortunately) prevents 2+3b. Requirements for both strict validation and extensibility are not in opposition: The combination is fundamental to strongly typed OO languages such as C++ and Java.
Finally, #4b is not handled because attempting to add additionalProperties: false to an object schema using allOf produces an impossible schema, as the additionalProperties: false branch must not be required to know what properties are defined elsewhere in the allOf, and therefore thinks that no properties are defined.
——————
So. Assuming we agree that they are within the scope of JSON Schema, we need to address use cases #2+3b, 4b, and 6.
#2+3b: First, we need to make it possible to distinguish between use cases #1 and #2. Currently we do not have inheritance (allOf is more of an intersection operator). So I would say that “additionalProperties: false” on its own should indicate #2: the desire for strict validation of property names. That is it’s most clear current effect, and many, many JSON Schema users have indicated that that is why they use “additionalProperties: false”.
To indicate use case #1, we could add a boolean keyword such as “inheritable” which is true by default (this goes with {} being the most permissive possible schema).
Now, use case #3b would be handled by introducing “inherits” (I’m avoiding “extends” as it was in v3). The implementation of “inherits” would be similar to allOf except that additonalProperties: false, whether it appears on the base or derived object schema, would apply to the union of properties at that object level.
This would also support use case #4b, as inheriting and object schema with just {“additionalProperties”: false} would produce the desired effect of locking down the property names without otherwise changing the base.
There would be some subtleties to work out if we want “inherits” to have special (defined as ‘unlike “allOf”’) behavior for schema elements other than object properties. This also produces a situation where either “inherits” or “allOf” could work for certain cases, and we may want to provide some guidance on that. It needs a bit more thought.
——————
I should address one of the major proposals here, “ban additional properties mode”, which changes the behavior of validation for the entire schema. In addition to making it impossible to tell how the validation will behave solely from the schema, this approach largely inverts the meaning of {} from the most permissive schema to a schema that will not validate any object. Although it may validate other types successfully.
Even aside from those problems, “ban additional properties mode” is essentially a debug mode, per the author’s description of enabling it during development and disabling it in production. This means it does not address #2+3b or 4b at all. Prioritizing fail-fast over flexibility is a common tradeoff and must be supported in production, not just during development.
Finally, “ban additional properties mode” doesn’t do anything at all for use case #6.
——————
Speaking of use case #6, that is the last case we need to address. The $merge/$patch pre-processor proposal *could* handle this. $merge/$patch in general could produce an arbitrarily transformed set of validation rules. However, use case #6 does not involve validation fields so the validation behavior would not be affected. We would either have to specify or strongly recommend that $merge/$patch not be used to modify non-annotation fields.
The other $merge/$patch problem is addressing the resulting schema. Since use case #6 does not affect validation, from the validator’s point of view a $merge is no different than a $ref to the source schema, and should be handled the same way for addressing when needed.
Alternatively, to avoid having to attempt to restrict a general JSON-editing technique to specific fields, we could add a keyword such as “usage” or “annotations”. This would be an object that only contained annotation fields, with the behavior that any fields in “usage” overwrite the same fields in $ref’d, allOf, anyOf, oneOf, etc. schemas at the same level.
This also does not change the validation behavior at all, and avoids adding more rules of addressing across preprocessors. I confess I somewhat prefer this idea, as it is less open to abuse. It also allows anyone to still use the $merge/$patch ideas across the JSON world, including with JSON Schema if they desire. It would just not be a proper JSON Schema until after the pre-processing. Kind of like how you can do really crazy tricks with the C pre-processor if you really want to. But you generally shouldn’t.
——————
I feel fairly confident that this set of use cases, or something very much like it, is key to resolving this problematic area. I feel that they clarify the problems with both of the major proposals and allow us to focus on specifically targeting solutions to situations that we consider to be valid forms of re-use. I am not as strongly attached to my specific solutions, although I think they make a good starting point.
I am looking forward to any and all feedback, positive or negative :-)
thanks,
-henry
First, we need to make it possible to distinguish between use cases #1 and #2. Currently we do not have inheritance (allOf is more of an intersection operator). So I would say that “additionalProperties: false” on its own should indicate #2: the desire for strict validation of property names. That is it’s most clear current effect, and many, many JSON Schema users have indicated that that is why they use “additionalProperties: false”.
To indicate use case #1, we could add a boolean keyword such as “inheritable” which is true by default (this goes with {} being the most permissive possible schema).