Unique properties in "collection"

1,191 views
Skip to first unread message

Rodrigo Alvarez

unread,
Oct 28, 2013, 5:51:05 AM10/28/13
to json-...@googlegroups.com
Hi there,

I am in the process of prototyping a JSON Schema based HTTP API using ruby and rack.
I would like to know if there is a proper way to specify that a given property or set of properties are unique among instances.

An option would be to have schemas for both instances and collections, which would be arrays with $ref to the instance schemas plus the constraints, no?

Thanks in advance.

Geraint

unread,
Oct 28, 2013, 2:17:59 PM10/28/13
to json-...@googlegroups.com
No, I'm afraid there is no such constraint.

It should be possible to construct a hyper-validator that checks "self" links, though - like, if two items have the same rel="self" link, then they should be equal.  That plus "uniqueItems" would cover you...

mazswo...@gmail.com

unread,
Jul 21, 2014, 5:48:04 AM7/21/14
to json-...@googlegroups.com
Hi, 

I am also interested in this topic. Basically I would like to indicate in the JSON Schema that one of objects properties in an array shall be unique (for example: think of ID column in a customers database). 

In other words it would be great to specify in schema that it is OK to have zero or more "John Smith" in the array as long as their "customerId" property value is unique among all array items.

Is this doable with v4 schema? If not, do you think it could be implemented in v5 or v6?

Thank you.

Geraint

unread,
Jul 21, 2014, 7:07:59 AM7/21/14
to json-...@googlegroups.com, mazswo...@gmail.com
This is not doable with v4 schema.

A (fairly simple) modification could be to allow "uniqueItems" to be a JSON Pointer path.  That is, you would say something like:

{
    "type": "array",
    "items": {...},
    "uniqueItems": "/customerId"
}

I think that would solve your particular case - how would people feel about something like that?

silvio....@gmail.com

unread,
Jul 21, 2014, 8:31:34 AM7/21/14
to json-...@googlegroups.com, mazswo...@gmail.com
I wonder if this is not too restrictive. I mean what about jointly unique fields or separately unique fields. Especially separately unique fields are very common. In a user database for instance, you often want a unique id that is unchanging so you can refer to a user and you want a unique email address for login purposes.

Also, when talking about databases, it would be nice to have the same properties that exist for arrays (minItems,maxItems,uniqueItems) for patternProperties as this is just the unsorted, keyed version of the same thing. Unfortunately, I wouldn't know where to put these properties in the case of patternProperties but it would be equally important.

Bartek Gola

unread,
Jul 21, 2014, 12:30:09 PM7/21/14
to silvio....@gmail.com, json-...@googlegroups.com
I agree with Silvio comments. I believe we should aim for a similar functionality relational databases provide to define unique indexes. How about sth like:

{
    "type": "array",
    "items": {},
    "uniqueItems": [ "customerId", "email" ]
}

However this does not solve the "composite key" scenario. We could solve it with arrays of arrays but syntax gets clumsy along the way:

{
    "type": "array",
    "items": {},
    "uniqueItems": [ ["customerId"], ["email"], ["firstName", "lastName"] ]
}

The example #2 says:
   - "customerId" has to be unique
   - "email" has to be unique
   - "firstName" and "lastName" combined together have to be unique, i.e. it is OK to have two or more people with name "John" but only one (or none) "John Smith" is allowed.

Matthew O'Donoghue

unread,
Jul 21, 2014, 6:11:55 PM7/21/14
to json-...@googlegroups.com
Yes it is too restrictive. JSON schema's job is to define data and it shouldn't be concerned with business logic around that data. That's what the code interface is for.


--
You received this message because you are subscribed to the Google Groups "JSON Schema" group.
To unsubscribe from this group and stop receiving emails from it, send an email to json-schema...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mazswo...@gmail.com

unread,
Jul 22, 2014, 4:47:10 AM7/22/14
to json-...@googlegroups.com
At the same time validation of unique properties among instances is so ubiquitous I believe it makes sense to include such a feature in the schema. There is no point for developers around the world to implement this kind of validation over and over again. 

Also we already have the "uniqueItems" feature and this proposal is basically an extension of this feature.

silvio....@gmail.com

unread,
Jul 22, 2014, 9:21:45 AM7/22/14
to json-...@googlegroups.com
I actually like the array of array solution. It may look a bit clumsy, but it covers a very wide range of use-cases, it is intuitive (I would have suggested this if it didn't look so clumsy, and it is easily incrementally verifiable. However, that leaves the problem of patternProperties which are much more like databases than arrays since arrays are ordered lists.

mazswo...@gmail.com

unread,
Jul 22, 2014, 11:54:03 AM7/22/14
to json-...@googlegroups.com, silvio....@gmail.com
Yeah, maybe the array of arrays is not that bad at all. The longer I look at it the more I like it. ;-)

When it comes to patternProperties: can you please provide an example when/how you would like to use it? What kind of use case would that be?

silvio....@gmail.com

unread,
Jul 22, 2014, 12:40:57 PM7/22/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com
It's just a question of how you look at it. I always thought of databases more like hashtables. E.g.

As array a db could look like this

[
  {
    "id": "id1234",
    "first name": "John",
    "last name": "Taylor"
    "email": "tay...@example.com"
  },
  ...
]

as PatternProperties it would look like this

{
  "id1234": {
    "id": "id1234", // not sure about this line. It's not really needed, but it's still nice to have it
    "first name": "John"
    ...
  },
  ...
}

I mean it's a conceptual difference. A list has an order; a hashtable has a key. A list has can be reordered; a hash table can not. I guess you can just say at the display or backend or whatever interface that you don't care about the order of the list and that your key is one of the properties. That might solve the problem.

Also, I'm not sure it's a good idea to overload the uniqueItems keyword. Better make a new one that accurately describes the array. Something like "uniquePropertyArrays", or more English and less Jsonish "uniquePropertyTuples".

mazswo...@gmail.com

unread,
Jul 22, 2014, 5:28:16 PM7/22/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com
The issue with databases/hashtables vs. arrays is that libraries like JSON Editor (https://github.com/jdorn/json-editor) work really nice with arrays whenever 
you want to add additional item(s) or delete existing ones. This is not the case for hashtables. 

When it comes to overloading uniqueItems keyword - actually that would not be the first keyword which would use 2 kinds of values. One example is the "additionalItems" keyword which must be either a boolean or an object. In case of uniqueItems it would be either boolean or array of arrays of strings. The nice feature of such an approach is that existing schemas would be compatible with the new one.

Geraint

unread,
Jul 23, 2014, 8:16:18 AM7/23/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com
Yes - if the meanings are close enough, this overloading seemed reasonable to me.

I'm not sure about the array of arrays of strings, though.  It seems overly complicated, and could be covered using a slightly simpler syntax and "allOf":

{
    "type": "array",
    "allOf": [
        {"uniqueItems": ["/customerId"]},
        {"uniqueItems": ["/email"]},
        {"uniqueItems": ["/firstName", "/lastName"]}
    ]
}

I think "an array of strings" is complicated enough, and I think an array-of-arrays-of-strings is significantly less readable.

(This uniqueItems modification is still speculative, of course, but it is neat.)

mazswo...@gmail.com

unread,
Jul 23, 2014, 9:19:30 AM7/23/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com
The "allOf" syntax looks nice! 

How do we move from this idea to JSON Schema standard enhancement candidate?

jere...@gmail.com

unread,
Jul 23, 2014, 11:06:41 AM7/23/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com
Every single other JSON Schema keyword can be fully expressed without using allOf and this would be the first one that couldn't.  For example:

{
 
"allOf": [
   
{"required": ["firstName","lastName"]},
   
{"required": ["email"]}
 
]
}

// Equivalent schema written without allOf
{
 
"required": ["firstName", "lastName", "email"]
}

This may not seem like a big deal, but it complicates validation.  In my implementation in JSON Editor, I pre-process schemas to remove allOfs (merge them into the parent).  This vastly simplifies the rest of the validation.

The array of arrays approach is definitely more clumsy, but it has the benefit of behaving like the rest of the keywords.

A slight tweak could be making it an array of arrays or strings.  For example:

{
 
"type": "array",
 
"uniqueItems": ["/customerId", "/email", ["/firstName","/lastName"]]
}

// Identical schema using allOf
// Note the double brackets for firstName, lastName

{
 
"type": "array",
 
"allOf": [
   
{"uniqueItems": ["/customerId"]},
   
{"uniqueItems": ["/email"]},
   
{"uniqueItems": [["/firstName","/lastName"]]}

 
]
}

Geraint

unread,
Jul 24, 2014, 11:20:11 AM7/24/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
On Wednesday, 23 July 2014 16:06:41 UTC+1, jere...@gmail.com wrote:
Every single other JSON Schema keyword can be fully expressed without using allOf and this would be the first one that couldn't.

 
Actually, I strongly disagree.

Here's an analogy: say I have multiple sets of exclusive constraints:
* instances must either have a "length" property or an "end" property
* instances must either have an "author" property or a "group" property

The way to express this is:
{
    "allOf": [
        {
            "oneOf": [
                {"required": ["length"]},
                {"required": ["end"]}
            ]
        },
        {
            "oneOf": [
                {"required": ["author"]},
                {"required": ["group"]}
            ]
        }
    ]
}

I think this is extremely similar to your multiple sets of "uniqueItems" constraints.

Now, we could have invented an array-of-arrays-of-schemas syntax, like:
{
    "oneOf": [
        [
            {"required": ["length"]},
            {"required": ["end"]}
        ],
        [
            {"required": ["author"]},
            {"required": ["group"]}
        ]
    ]
}
(This is analogous to the discussed "array of arrays of strings" syntax for "uniqueItems".)

However, we don't have a syntax like that, and I don't think we actually need it for "uniqueItems" either.  In fact, unless we're planning to also enable similar array-of-arrays syntax for "oneOf" and "anyOf", then I'd argue that we shouldn't support it for uniqueItems.

Geraint

mazswo...@gmail.com

unread,
Jul 25, 2014, 8:39:38 AM7/25/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
If I understood correctly you are against those proposals (which is just fine, that's what this group is for - to discuss and come up with best ideas). 

Do you have another idea how to describe this feature in JSON schema? I strongly believe it should become a part of the standard no matter what the syntax is.


Geraint

unread,
Jul 26, 2014, 12:10:44 AM7/26/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
(sorry for short reply, on the move)

I don't object to the "array of strings" proposal, although I don't like the "array of array of strings" proposal because we can do that with "allOf" (and be more in line with other keywords).

It might be good to have the strings be JSON Pointers, because it's more flexible, but I wouldn't try to force it if nobody else agreed.

Geraint

Jeremy Dorn

unread,
Jul 26, 2014, 12:11:05 AM7/26/14
to mazswo...@gmail.com, json-...@googlegroups.com, silvio....@gmail.com
Geraint, that's a good point about oneOf.  I didn't think of that use case.  I'm in favor of using the allOf syntax then.

mazswo...@gmail.com

unread,
Jul 29, 2014, 4:43:57 PM7/29/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
So looks like we all agree on the "array of strings" + "allOf" syntax. What's the next step to make it a part of the JSON standard?
Bartek

shawki...@gmail.com

unread,
Jul 29, 2014, 9:09:09 PM7/29/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
I think that this discussion is about the same problem that I'm having but it seems to be framed differently. So I'm not sure that the proposed "array of strings" + "allOf" syntax actually deals with my situation.

I'm creating a JSON data structure to model genealogical data. Part of this involves modelling the content of documents such as a birth certificate. There are two elements of this of interest here. One lists the subjects in the document and the other describes the relationships among the subjects.

At first it seems plausible that you can describe these two elements as follows:


"subject" : {
   
"type" : "array",
   
"items" : {    
       
"type" : "object",
       
"properties" : {    
           
"name" : { "type" : "string" },          
           
"role" : { "type" : "string" }, ...
}}}

"relationship" : {
   
"type" : "array",
   
"items" : {
       
"type" : "object",
       
"properties" : {
           
"source_subject" : { "type" : "string" },
           
"relationship" : { "type" : "string" },
           
"target_subject" : { "type" : "string" }
}}}


where the values of "source_subject" and "target_subject" are JSON pointers to items in the subject array.

The problem with this is that the pointers reference relative and not absolute positions in the subject array. If the order of the elements in the array changes, say when an item is dropped, the referential integrity of the pointer is destroyed.

As suggested by silvio, a solution to this is to use keyed arrays or hash tables instead of the relative arrays described by JSON Schema. Using keyed arrays the subject and relationship data might look like this:

"subject" : {
   
"key1" : {
     
"name" : "John Doe",
     
"role" : "self"},
   
"key2" : {
     
"name" : "Fred Doe",
     
"role" : "parent"}
   
}
relationship
: [ {
   
"source_subject" : "/subject/key2",
   
"relationship" : "father",
   
"target_subject" : "/subject/key1" } ]


This works well but now the subject array cannot be described by a JSON Schema. JSON schema can only describe "values in a key/value pair and not the keys. It also assumes that the name of the key is known and in situations like the subject array, the keys, in general, cannot be known ahead of time.

(You will notice that the absolute key problem does not appear in the relationship array only because we are not trying the reference the array items from outside the array.)

This problem occurs over and over in complex data structures. Right now I maintain my data using keyed arrays. This helps ensure the referential integrity of JSON pointers across the entire data structure. To make use of Jeremy Dorn's excellent json-editor, I map keyed arrays into relative arrays that can be described by JSON Schema and then map them back once the edits are done.

It would be nice if JSON Schema could amended to describe a collection of objects where each instance is identified by a unique key. It would be like the existing JSON Schema array structure except that the "array" items would have explicit keys instead of implicit keys based on item order. Constraints could now be placed on the key (uniqueness being a critical one) as well as on the instance values.

Do your proposals actually cover this use case?

silvio

unread,
Aug 26, 2014, 3:11:42 PM8/26/14
to shawki...@gmail.com, json-...@googlegroups.com, mazswo...@gmail.com, jere...@gmail.com
You will probably never get the perfect solution from JSON Schema. Ideally, you could check that the pointed to element actually exists, but this would require values from a completely different part of the JSON tree to valudate "relationship" and I don't think this will happen. That said there are a couple of things you can do.

1) You can describe the key in key value pairs. That is what PatternProperties are. You can give a regex as key. However if the proposal goes through as suggested now you won't be able to use any uniqueness stuff for that.
2) You let go of JSON references and just use a value that you know will identify an item in the array. You can simply chose one of the unique properties.
3) You can use some presentation framework that gives more control such as http://github.com/jdorn/json-editor#enum-values. This can do exactly what you want including actually checking the pointed to element exists.

Anyway, I don't think it's possible to add anything to pattern properties without changing the syntax which is probably not desired.

So unless there are any concrete suggestions, I agree. What's the next step to making it a part of the JSON Schema standard?

ralf....@sap.com

unread,
Nov 21, 2014, 5:16:22 AM11/21/14
to json-...@googlegroups.com, silvio....@gmail.com, mazswo...@gmail.com, jere...@gmail.com
I'm also interested in making this part of the JSON Schema standard.

Background: the OASIS OData Technical Committee (https://www.oasis-open.org/committees/odata) is currently defining a JSON format for OData service metadata, and we want to use JSON Schema as the basis for this format. OData is a RESTful protocol for interacting with structured data described by an entity-relationship model, and one aspect of OData entities is that they can be uniquely identified within their entity set by the values of their key properties.

colin....@gmail.com

unread,
Jul 9, 2015, 1:04:46 PM7/9/15
to json-...@googlegroups.com
underneath "type":"object" I am going to extend (for my use only) with the following
"key" : ["prop1", "prop2", "prop3"....]

The "key" keyword just identifies the primary key using one or more properties.

so I would have this for instance

"items" :
{ // "items" represents the items within product array....
    "type" : "object",
    "key" : ["id"], // <<<<< here is the extension!
    "properties" : {
        "id": { "type": "string" },
        "name": { "type" : "string" },
        "desc": { "type" : "string" },
        "qty": { "type" : "number" }
    }
}

That solves a bucket load of issues for me...when JSON schema finally has a light bulb on simplicity then I will swap my keyword for the one that they use...I can't see how much simpler it could be??
Reply all
Reply to author
Forward
0 new messages