Desiging Schemas for Polymorphic Collections

1,217 views
Skip to first unread message

pete.h...@gmail.com

unread,
Jun 13, 2014, 4:46:49 PM6/13/14
to json-...@googlegroups.com
I'm trying to think through what I imagine to be basic schema design issues. For simplicity's sake, assume I'm generating a JSON Schema for a hypothetical REST API comprising an assortment of resources with their own, varying schemas. Assume "vegetables" is one such resource. As one might expect GET /vegetables will return a collection of vegetable resources. My current thinking is that it makes more sense to return a response like this:

GET /vegetables
{
    "item_count": 120,
    "items": [
        {"name": "squash"},
        {"name": "broccoli"},
        {"name": "asparagus"} 
    ]
}

Than one like this:

GET /vegetables
{
    "item_count": 120,
    "vegetables": [
        {"name": "squash"},
        {"name": "broccoli"},
        {"name": "asparagus"} 
    ]
}

Further assume that I want the resources returned in the collection to share a schema with the individual vegetable resources (GET /vegetables/14). That being the case, what is the best way to structure these schemas? Lots of individual schemas that reference each other? One giant schema for the entire API? Something else?

Any thoughts you have would be greatly appreciated (including arguments that the second example above is the preferable one -- my current thinking is that creating a schema resource for each individual collection could lead to a headache if we were to ever want to change what our collections look like).

Roger Costello

unread,
Jun 19, 2014, 3:15:47 PM6/19/14
to json-...@googlegroups.com
Hello Pete,

This is not an answer to your question. Rather, it is a question about your example: what are you using json-schemas for in your example? Will you be validating the JSON-formatted data prior to sending it to the clients? Or, will you distribute the schemas to the clients and invite them to validate the data that they receive from you RESTful web site? Insights into how you intend to use the schemas would be appreciated.  /Roger

Pete Holiday

unread,
Jun 19, 2014, 3:47:24 PM6/19/14
to json-...@googlegroups.com
Hi Roger,
Thanks for your reply -- ideally it'd be all of the above. My plan is:

1. The API uses JSON Schema as something of a recipe to generate the right response
2. Those Schema documents will also be publicly available for clients to use in interpreting the API responses.

Ideally I'd be able to have a schema that describes the generic "collection" resource which could contain any of our resources -- my inclination is to have that collection response be polymorphic, but I'm having trouble designing a schema to match that intent.

Does that clear things up at all?

-Pete
 


--
You received this message because you are subscribed to a topic in the Google Groups "JSON Schema" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/json-schema/n9Ox8jIPfBk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to json-schema...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Roger Costello

unread,
Jun 19, 2014, 4:15:34 PM6/19/14
to json-...@googlegroups.com
Thanks Pete, that is very helpful. If you have more details on how you will use "JSON Schema as something of a recipe to generate the right response" I would be very interested to learn.
 
As for polymorphism in JSON Schema, wouldn't something along these lines work:
 

{
   
"$schema": "http://json-schema.org/draft-04/schema#",
   
"type": "object",
   
"properties": {
       
"car": { "type": "string", "maxLength": 20 },
       
"juicer": { "type": "string", "maxLength": 20 }
   
},

    "dependencies": {
        
"car": {
           
"type": "object",
           
"properties": {
               
"numPassengers": { "type": "integer", "minimum": 1 }
           
},
           
"required": ["numPassengers"]
       
},
       
"juicer": {
           
"type": "object",
           
"properties": {
               
"kind": { "type": "string", "enum": ["press", "centrifugal"] }
           
},
           
"required": ["kind"]
       
}
   
}
}

Pete Holiday

unread,
Jun 19, 2014, 5:32:57 PM6/19/14
to json-...@googlegroups.com
Thanks, Roger!
We're using JSON Schema as a "template" for our API responses by having the Renderer object take the Resource object and the appropriate JSON Schema, work through the JSON Schema's definitions and pull the appropriate data out of the Resource object (which is a thin layer/interface between the API and our Models) to build up the correct data structure to then be serialized and sent back to the client. This allows us to guarantee that our responses will always match the advertised JSON Schema and it makes adding new resources to our API a bit easier: build the JSON Schema to describe your response, then create the Resource class which knows how to get at that data. There's more to it, but that's the 50,000 foot view. 

As for dependencies, I'll need to think on that suggestion a bit more (and read that section of the docs more carefully), but my gut reaction is that it might not quite get the job done. To demonstrate what I mean, I'll swap back to my vegetables example and elaborate on it a bit. Let's say that my API also includes a "Cheeses" resource. In typical RESTful style, then, 

GET /vegetables

would return a collection of vegetable resources and

GET /cheeses

would return a collection of cheeses. One (possibly unique) feature of our API is that we'd like for the collection endpoints (above) to return the *full* resource in the response. So, let's say our resources look like this:

GET /vegetables/12
{
    "id": 12,
    "name": "lettuce",
    "color": "green",
    "plant": "march",
    "harvest": "june"
}

GET /cheeses/34
{
    "id": 34,
    "name": "camembert",
    "region": "Normandy, France"
}

And our collection might look like this:

GET /cheeses
{
    "total_count": 10,
    "data": [
        {"id": 34, "name": "camembert", "region": "Normandy, France"},
        {"id": 35, "name": "maasdam", "region": "Netherlands"},
        {"id": 36, "name": "laguiole", "region": "Aveyron, France"}
    ]
}

That being the case, I know I have two options (there may be more than that):

1. Instead of having one "collection" resource, I have two: a vegetables collection and a cheeses collection
2. I include the individual resource schema for both cheese and vegetable in the collection schema and use oneOf to show that they're both permissible. 

This is where I feel that I might be missing something, because this seems like it must be a reasonably common scenario, but I'm running into a pitfall with each of those options:

1. If I do this, I drastically increase the work required to modify a collection schema (since now I have to change a bunch of individualized collection resources) and I risk future developers of the API changing a collection for a particular resource type and losing the benefit of having a consistent collection response.

2. This requires either the full duplication of the individual resource schema in a new place, or it requires some kind of "include" feature, so as to say "items in the data array must match one of the following external schema."

It could be that my life would be easier if I change my collection resource from using a unified (polymorphic) data to one that uses, say, a cheeses property to hold an array of individual cheese resources, but I'd still need to be able to reference an external schema, which I haven't quite wrapped my head around yet.

I feel like I must be missing something!

Apologies for the wall of text -- thanks for your replies!

-Pete

Ben Hockey

unread,
Jun 20, 2014, 10:11:45 AM6/20/14
to json-...@googlegroups.com


On Thursday, June 19, 2014 4:32:57 PM UTC-5, Pete Holiday wrote:

That being the case, I know I have two options (there may be more than that):

1. Instead of having one "collection" resource, I have two: a vegetables collection and a cheeses collection
2. I include the individual resource schema for both cheese and vegetable in the collection schema and use oneOf to show that they're both permissible. 

This is where I feel that I might be missing something, because this seems like it must be a reasonably common scenario, but I'm running into a pitfall with each of those options:

1. If I do this, I drastically increase the work required to modify a collection schema (since now I have to change a bunch of individualized collection resources) and I risk future developers of the API changing a collection for a particular resource type and losing the benefit of having a consistent collection response.

2. This requires either the full duplication of the individual resource schema in a new place, or it requires some kind of "include" feature, so as to say "items in the data array must match one of the following external schema."

It could be that my life would be easier if I change my collection resource from using a unified (polymorphic) data to one that uses, say, a cheeses property to hold an array of individual cheese resources, but I'd still need to be able to reference an external schema, which I haven't quite wrapped my head around yet.

I feel like I must be missing something!


with that i'd say you've got 2 options that are similar to what you're considering but should address the issues you've raised.

1. similar to your first option, you can build multiple collection resources that extend a base collection resource by using allOf (my syntax might be slightly off - i'm more familiar with earlier drafts of json-schema)

{
  allOf: [
    // reference a base collection definition that describes the various top-level properties
    { $ref: collection.json },

    // extend it by saying that a collection has an items property and each of the objects in the "items" array should be a vegetable
    { 
      items: { 
        type: array, 
        // reference to a schema describing a single vegetable - you could inline it here if you want
        items: { $ref: vegetable.json }
      }
    }
  ]
}

2. this option is almost exactly what you're thinking of for your 2nd option except with the awareness of $ref you no longer have the problem of needing to duplicate the individual resources.  your collection schema would look something like

{
  total_count: { type: number },
  items: {
    type: array,
    items: {
      oneOf: [
        { $ref: vegetable.json },
        { $ref: cheese.json }
      ]
    }
  }
}

i've been a bit out of touch with json-schema since draftv3 but my understanding is that something like those 2 options is possible and should address your concerns.  i'd be glad to get your feedback about those suggestions to help me get back up to speed if i've misunderstood something.

thanks,

ben...
    

Pete Holiday

unread,
Jun 20, 2014, 10:24:59 AM6/20/14
to json-...@googlegroups.com
Oh, that's awesome. I had just been thinking about $ref as being local references only. Makes sense that it would be external as well. That all being the case, I do like the definitiveness of the first option, since we don't have any use cases right now where items would contain different types.

Thanks Ben & Roger for your help walking me through this! 

-Pete


--

Geraint

unread,
Jun 20, 2014, 11:08:09 AM6/20/14
to json-...@googlegroups.com
On Thursday, 19 June 2014 22:32:57 UTC+1, Pete Holiday wrote:

That being the case, I know I have two options (there may be more than that):

1. Instead of having one "collection" resource, I have two: a vegetables collection and a cheeses collection
2. I include the individual resource schema for both cheese and vegetable in the collection schema and use oneOf to show that they're both permissible. 

This is where I feel that I might be missing something, because this seems like it must be a reasonably common scenario, but I'm running into a pitfall with each of those options:

1. If I do this, I drastically increase the work required to modify a collection schema (since now I have to change a bunch of individualized collection resources) and I risk future developers of the API changing a collection for a particular resource type and losing the benefit of having a consistent collection response.

The way I'd deal with this is to have a common "collection" schema, e.g.:
{
    "id": "/schemas/collection",
    "properties": {
        "count": {"type": "integer", "minimum": 0},
        "create": {"type": "string", "format": "uri"},
        ...
        "data": {
            "type": "array",
            "items": {"$ref": "/schemas/resource/common-type.json"}
        }
    }
}

Individual resources then have their own "collection" schemas.  These extend the common collection definition, giving more specific information about the items in "data":

{
    "id": "/schemas/resources/vegetable",
    ...
    "definitions": {
        "collection": {
            "allOf": [{"$ref": "/definitions/collection}],
            "properties": {
                "data": {
                    "items": {"$ref": "#"}
                }
            }
        }
    }
}

Sure, if you decide to rename the "data" property to "collectionItems", then that impacts the subclassed/extended collection schemas.  However, since the majority of the collection definition is defined centrally, most changes would only happen in once place.
 
2. This requires either the full duplication of the individual resource schema in a new place, or it requires some kind of "include" feature, so as to say "items in the data array must match one of the following external schema."

The "include" feature you're thinking of is probably "$ref", e.g.
{
    "type": "array",
    "items": {
        "anyOf": [
            {"$ref": "/schemas/resources/vegetable.json"},
            {"$ref": "/schemas/resources/cheese.json"}
        ]
    }
}

However, this approach has the disadvantage that anybody wishing to verify one of your responses (e.g. /cheeses) ends up having to fetch (and test against!) the schemas for every resource type in your system.

This seems wasteful to me, which is part of why I tend to go for individualised "collection" schemas extending a common core.

Geraint

Pete Holiday

unread,
Jun 20, 2014, 1:51:52 PM6/20/14
to json-...@googlegroups.com
That's a great point, and something I hadn't really considered. That seems to go away if we eliminate the polymorphism and change from "items" to individual "vegetables" and "cheeses" arrays, too. Could easily do both non-polymorphic AND common collection schema extension.

Will have to spend some time thinking about this. 

Thanks for all of the input so far!


--
Reply all
Reply to author
Forward
0 new messages