JSON Schema completeness

286 views

Skip to first unread message

John Snyders

unread,

May 8, 2010, 11:12:18 PM5/8/10

to JSON Schema

What kinds of JSON documents can JSON schema describe? Can it validate
all possible JSON documents? No. Does it validate enough of them?

I break validation down in to scalar and shape validation. By shape I
mean the properties an object can have and the items an array can have
and recursively the shape the property and item values have. Scalar
validation includes the range of string, boolean, and numeric values.

Most of the schema attributes fall in one or the other category:

Shape: properties, items, optional, additionalProperties, requires,
minItems, maxItems, uniqueItems

Scalar: minimum, maximum, minimumCanEqual, maximumCanEqual, pattern,
maxLength, minLength, enum, format, divisibleBy

Others categories
- Control: type, disallow, extends. These attributes drive the
validation. Type discriminates between scalar values and values that
give rise to shape (arrays and objects).
- Documentation: title, description.
- contentEncoding, and default have nothing to do with validation.
They tell the client processor how to interpret the document and not
if the document is valid. I think this makes them more like the
attributes described in section 6 Hyper Schema. So I think they should
be moved there.

I think it would be useful to group the attributes together into sub
sections based on the above categories.

I think the scalar attributes do a pretty good job of covering the
most important validations. It can't handle a numeric property that
must be prime but so what that's a silly example. It is also easy to
see how an implementation could allow extensions to easily validate
other constraints on scalars (for example adding a prime: true
attribute.) It might be worth formalizing an extension mechanism like
XSLT 2.0 does with extension functions.

There are a few changes I would make. I would prefer limitMax and
limitMin to the maximumCanEqual and minimumCanEqual attributes (or
perhaps upperBound, lowerBound). These attributes are mutually
exclusive with the corresponding maximum and minimum. For Example:
"minimum": 5, "limitMax": 10 defines the range from 5 inclusive to 10
exclusive. One property, limitMax, is better than two properties,
maximum and maximumCanEqual.

I prefer maxDecimal to divisibleBy.

For the shape of data I think there are some important things that
RELAX NG can describe that JSON Schema cannot. I also think that it is
most important to get the shape validation right because it would be
harder for an implementation to extend JSON Schema to handle
additional cases. Here are some examples of things I don't see a way
to do in JSON schema. Please correct me if I'm wrong.

1) An object can have properties A and B and C or property D (mutual
exclusion). I wonder if disallow can help with this. Even if so I
think it would be more confusing than what RELAX NG would do. You can
also have an object with multiple groups of mutually exclusive
properties.

2) An array can contain from 1 to 3 items of type foo followed by any
number of items of type bar. Examples like this as well as the tuple
typing treat arrays more like XML element content models. I think it
is much more common in JSON for arrays to either have all items of the
same type or allow any of the items to be one of a few types (like an
array where each element is a number or string). JSON Schema handles
the most common cases. Does it need to do the things that RELAX NG can
do?

3) An array can contain type A optionally followed by type B followed
by type C or it can contain type A followed by D and E.

4) An array with any number of string, object pairs such as [ “apple”,
{...}, “kiwi”, {...}, ...]

I think what is missing in JSON Schema that RELAX NG has is a way to
represent properties and items and the equivalent of choice and group.

{
"type": "object",
"properties": [
{ "name": "X", "type": "number",
"optional": "true"},
{ "choice": [
{ "group": [
{ "name": "A", "type": {...}},
{ "name": "B", "type": {...}},
{ "name": "C", "type": {...}}
] },
{ "name": "D", "type": {...} }
] }
]
}

The above example is similar to example 1 above. The instance object
can have properties (A, B, and C) or D. It can also have an optional X
property.

I still like the value of the properties attribute taking an object
where the properties are the names of the instance properties. It is a
short hand that can be used when the full power of choice and group
are not needed.

{
"type": "object",
"properties": {
"foo": {"type": "string", "optional": "true" },
"bar": {"type": "number" }
}
}

is short hand for:

{
"type": "object",
"properties": [
{ "name": "foo", "type": "string", "optional": "true"},
{ "name": "bar", "type": "number"},
]
}

I see requires as a very strange and special purpose attribute. With
choice and group it is not needed. Here is the equivalent example from
the draft.

{
"type": "object",
"properties": [
{ "choice": [
{ "group": [
{ "name": "town" },
{ "name": "state" }
] },
{ "name": "state", "optional": "true" }
] }
]
}

The case where the value of requires can be a schema seems confusing
and possibly dangerous to me (If I understand it correctly). I think
"containing instance" means the instance object which has the property
described by this schema with requires attribute. The schema value of
the requires property is out of place while reading the schema since
it applies not to the property value but the object that contains the
property. It allows you to say strange things in complicated ways. Ex:
if an object has property x then it is required to not have property
x. I think choice and group are much clearer.

The following is how I think example 3 above could be handled.

{
"type": "array",
"items": [
{ "type": { "$ref": /*ref to type A*/} },
{ "choice": [
{ "group": [
{ "type": { "$ref": /*ref to type B*/}, "optional": "true" },
{ "type": { "$ref": /*ref to type C*/} },
] },
{ "group": [
{ "type": { "$ref": /*ref to type D*/} },
{ "type": { "$ref": /*ref to type E*/} },
] }
] }
]
}

(In rereading this, the "ref to type" stuff is not clear. I'm trying
to refer to a type defined elsewhere by name. Perhaps I want extends.
I'm trying to reuse a type definition and at the same time add
attributes such as optional that apply to the item.)

I don't think additionalProperties should apply to arrays. First, the
name is misleading since this is an array not an object. Second, if
the instance can be an array or an object then this property could
apply differently to each. If additionalProperties allows additional
arbitrary items in the tuple array how do you keep the correspondence
between the schema array and the instance array?

I think choice and group can do more than additionalProperties. To
allow additional arbitrary items just include a type: any item. Also
minItems and maxItems should refer to repetitions of the item.

Example 2 above could be handled like this:

{
"type": "array",
"items": [
{ "type": { "$ref": /*ref to type foo*/}, "minItems": 1,
"maxItems": 3 },
{ "type": { "$ref": /*ref to type bar*/}, "minItems": 0},
]
}

The optional attribute could also apply to items in an array. It would
be short hand for minItems: 0 maxItems: 1.

Example 4 – repeating groups of items in an array:

{
"type": "array",
"items": [
{ minItems: 0,
maxItems: -1, // means unlimited
group: [
{ "type": “string”, "minItems": 1, "maxItems": 3 },
{ "type": “object”, "minItems": 0}
]
}
]
}

--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To post to this group, send email to json-...@googlegroups.com.
To unsubscribe from this group, send email to json-schema...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/json-schema?hl=en.

Kris Zyp

unread,

May 10, 2010, 3:08:00 PM5/10/10

to json-...@googlegroups.com, John Snyders

That is fine with me. Anyone else have any preferences on these?

We do already have union capabilities with the type property. You can
write your example above like:

{
"type": [
{
"properties": {

{ "name": "town" },
{ "name": "state" }
}
},
{

"properties": {

{ "name": "state", "optional": "true" }
}
}
]
}

> The following is how I think example 3 above could be handled.
>
> {
> "type": "array",
> "items": [
> { "type": { "$ref": /*ref to type A*/} },
> { "choice": [
> { "group": [
> { "type": { "$ref": /*ref to type B*/}, "optional": "true" },
> { "type": { "$ref": /*ref to type C*/} },
> ] },
> { "group": [
> { "type": { "$ref": /*ref to type D*/} },
> { "type": { "$ref": /*ref to type E*/} },
> ] }
> ] }
> ]
> }
>
>

This could also be done with a union type by creating two separate tuple
definitions.

> (In rereading this, the "ref to type" stuff is not clear. I'm trying
> to refer to a type defined elsewhere by name. Perhaps I want extends.
> I'm trying to reuse a type definition and at the same time add
> attributes such as optional that apply to the item.)
>
> I don't think additionalProperties should apply to arrays. First, the
> name is misleading since this is an array not an object. Second, if
> the instance can be an array or an object then this property could
> apply differently to each. If additionalProperties allows additional
> arbitrary items in the tuple array how do you keep the correspondence
> between the schema array and the instance array?
>
> I think choice and group can do more than additionalProperties. To
> allow additional arbitrary items just include a type: any item. Also
> minItems and maxItems should refer to repetitions of the item.
>

It does seem reasonable to not have additionalProperties apply to array
items, and use minItems and maxItems in the tuple item definitions
instead. I don't know that I see the complexity of choice+group is worth
the advantage it provides, the need for it seems much rarer.

> Example 2 above could be handled like this:
>
> {
> "type": "array",
> "items": [
> { "type": { "$ref": /*ref to type foo*/}, "minItems": 1,
> "maxItems": 3 },
> { "type": { "$ref": /*ref to type bar*/}, "minItems": 0},
> ]
> }
>
> The optional attribute could also apply to items in an array. It would
> be short hand for minItems: 0 maxItems: 1.
>
> Example 4 – repeating groups of items in an array:
>
> {
> "type": "array",
> "items": [
> { minItems: 0,
> maxItems: -1, // means unlimited
> group: [
> { "type": “string”, "minItems": 1, "maxItems": 3 },
> { "type": “object”, "minItems": 0}
> ]
> }
> ]
> }
>
>

--