JSON Schema for Schema.org / Enumerations with properties

1,390 views
Skip to first unread message

Jim Klo

unread,
Mar 28, 2013, 10:28:41 PM3/28/13
to json-...@googlegroups.com
Greetings,

I'm in the process of defining a JSON Schema for Schema.org.

For the most part it seems relatively straight forward - however there are a few things that they define that are "instances" of an enumeration - which can also have properties.  Think of this like a product where you have a fixed status: instock, outofstock, preorder, backordered, but then properties might be adaptable for specific use, consider a "label" property with corresponding values like "Available Now!", "Sorry, None left", "Coming Soon, Order Now", "Not Sure When More are Coming".

In Schema.org there's only a handful of these, but a specific example is MedicalAudience: http://schema.org/MedicalAudience

If representing the Schema.org using application/microdata+json like this where "http://schema.org/Clinician" is an instance, and I'm using it here in a contrived example where a Movie audience is a Clinician described with a specific code:

{
   "items": [
  {  "type": ["http://schema.org/Movie"],
    "properties": {
        "audience": [
          {
             "type": ["http://schema.org/Clinician"],
             "properties": {
                   "code": [{
                         "type": ["http://schema.org/MedicalCode"],
                         "properties": {
                             "codeValue": "ABC123",
                             "codingSystem": "ICD-10"
                         }

                   }]
             }

         ]
    }
  }
  ]
}

Here's what I've got so far as far as a JSON Schema. Does this seem like a reasonable (or even right) way to do this?  Note I'm only showing part of the schema here... the generated schema is about 15K lines... 

{
    "properties": {
        "type": {
            "items": {
                "enum": ["http://schema.org/MedicalAudience"]
            },
            "type": "array"
        },
        "id": {
            "$ref": "#definitions/Text"
        }
    },
    "type": [{
        "$ref": "#definitions/Audience"
    }, {
        "$ref": "#definitions/MedicalEnumeration"
    }],
    "description": "Target audiences for medical web pages. Enumerated type.",
    "title": "Medical Audience"
}

BTW: when I do get this completed.. It will be Apache 2 licensed... 

Thanks,

- Jim



Geraint

unread,
Mar 29, 2013, 5:38:31 AM3/29/13
to json-...@googlegroups.com
OK, so just to be clear - you have JSON representations of schemas on Schema.org, and you're writing a JSON Schema for that representation?

Or are you attempting to write a JSON Schema that is *equivalent* to the schema on Schema.org?

The second one is trickier - the schemas there are for XML, which is fundamentally a different structure for JSON.  In order to do something like that, you first have to define standard ways of translating from XML -> JSON, and JSON -> XML.  Once you have *that*, then you can define a JSON Schema for the JSON representations.

When it comes to your second example, it seems to have been written for XML, not JSON.  The "enum" at root which contains various URLs means that the instance has to be a string.  But strings in JSON have no properties - in fact, the use of this "enum" at the root level makes all other keywords useless.

Jim Klo

unread,
Mar 29, 2013, 11:38:24 AM3/29/13
to <json-schema@googlegroups.com>, json-...@googlegroups.com


Sent from my iPhone

On Mar 29, 2013, at 2:38 AM, "Geraint" <gerai...@gmail.com> wrote:

OK, so just to be clear - you have JSON representations of schemas on Schema.org, and you're writing a JSON Schema for that representation?


There may be JSON representations, but there exist no validators, and even then the ones that exist are for RDFa nothing really for JSON schema they provide (http://schema.rdfs.org/all.json). Even then everything I find is really tailored for extraction of microdata - not conformance to a defined RDFa 'schema', if I dare call it that. 

I suppose I could build my own validator using their schema - but then I'd need a validator for multiple platforms (java, php, python, Javascipt, etc), which I'm not inclined to build. It just seemed to make more sense to translate the schema to a model that has existing validators. 

Or are you attempting to write a JSON Schema that is *equivalent* to the schema on Schema.org?


Sort of. Schema.org is defined as microdata definded in RDFa lite. As it is an instance of HTML Microdata , we are encoding portable machine readable structures of schema.org data in a JSON representation using the transformation described here: http://www.w3.org/html/wg/drafts/microdata/master/#json


Here's some examples of actual data encoded this way:

Note there is a flaw in this data - which is why we need a validator. 

This transformation round trips nicely as well - at least for the machine readable parts. 


My intention was to define a schema for this microdata+json variant of Schema.org rather than translate each instance into RDFa and then use a RDFa validator. It's just seems less efficient since I'll have millions of records ultimately to process. 


The second one is trickier - the schemas there are for XML, which is fundamentally a different structure for JSON.  In order to do something like that, you first have to define standard ways of translating from XML -> JSON, and JSON -> XML.  Once you have *that*, then you can define a JSON Schema for the JSON representations.


Agreed, as described I'm using the transformation provided in the HTML Microdata specification. 


When it comes to your second example, it seems to have been written for XML, not JSON.  The "enum" at root which contains various URLs means that the instance has to be a string.  But strings in JSON have no properties - in fact, the use of this "enum" at the root level makes all other keywords useless.


So to be clear - the first example is an instance of JSON Microdata! The second is a partial schema for just the highlighted portion of the first. 

- Jim

--
You received this message because you are subscribed to a topic in the Google Groups "JSON Schema" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/json-schema/1-F-HYFtFDo/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to json-schema...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Austin Wright

unread,
Mar 29, 2013, 12:16:14 PM3/29/13
to json-...@googlegroups.com
RDFa is just a particular method of encoding RDF data into a DOM like HTML, so what do you mean exactly?

In my understanding, schema.org is an RDF vocabulary, which has little to do with JSON (unless perhaps you're also encoding the RDF as JSON-LD). Do you have a specific example of JSON data that you're trying to write a schema for?

Jim Klo

unread,
Mar 29, 2013, 2:28:55 PM3/29/13
to json-...@googlegroups.com
Apparently I'm either not describing the case clearly - or no one is actually reading the full thread to understand the issue. :( 


On Friday, March 29, 2013 9:16:14 AM UTC-7, Austin Wright wrote:
RDFa is just a particular method of encoding RDF data into a DOM like HTML, so what do you mean exactly?

In my understanding, schema.org is an RDF vocabulary, which has little to do with JSON (unless perhaps you're also encoding the RDF as JSON-LD). Do you have a specific example of JSON data that you're trying to write a schema for?


Correct - the vocabulary isn't really tied to any particular encoding, per say - however most of the time, it's encoded as microdata, embedded into an HTML DOM, which is then used by major search engines to spider websites and index them for searching.  My use case is slightly different - we aren't embedding anything into a DOM as we need the data portable for import into different solutions for a variety indexing purposes - including building of a graph database which may be used to discover related objects. 

The soon to be released Schema.org V1.0, contains a new vocabulary extension - LRMI (Learning Resource Metadata Initiative); which adds a couple new Object types and additional properties to the existing CreativeWork object - which are really the core pieces I'm interested in. 

Here's a Schema.org example record encoded in JSON using the algorithm described in HTML Microdata: 

         {
                "items": [{
                    "type": ["http://schema.org/CreativeWork"],
                    "id": "urn:www.khanacademy.org:node_slug:e/area_of_a_circle",
                    "properties": {
                        "name": ["Area of a circle"],
                        "author": [{
                            "type": ["http://schema.org/Person"],
                            "properties": {
                                "name": ["Omar Rizwan"]
                            }
                        }],
                        "url": ["http://www.khanacademy.org/exercise/area_of_a_circle"],
                        "dateCreated": ["2012-04-13T23:13:03Z"],
                        "educationalAlignment": [{
                            "type": ["http://schema.org/AlignmentObject"],
                            "id": "urn:corestandards.org:guid:8111E58EA0054B8C8DE2CF7AA27F2FD8",
                            "properties": {
                                "alignmentType": ["teaches"],
                                "educationalFramework": ["Common Core State Standards"],
                                "targetName": ["CCSS.Math.Content.7.G.B.4"],
                                "targetDescription": ["Know the formulas for the area and circumference of a circle and use them to solve problems; give an informal derivation of the relationship between the circumference and area of a circle."],
                                "targetUrl": ["http://corestandards.org/Math/Content/7/G/B/4"]
                            }
                        }],
                        "learningResourceType": ["exercise"]
                    }
                }]
            }

For the most part, I have a JSON Schema that covers this. DONE.

As I'm trying to cover the entire Schema.org vocabulary, there's one edge condition that I'm not sure how to approach - which is what is described in Schema as an instance.

The example I gave was this:

{
  "items": [
    {
      "type": ["http://schema.org/Movie"],
      "properties": {
        "audience": [
          {
            "type": ["http://schema.org/Clinician"],
            "properties": {
              "code": [
                {
                  "type": ["http://schema.org/MedicalCode"],
                  "properties": {
                    "codeValue": ["ABC123"],
                    "codingSystem": ["ICD-10"]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

but an equally permissible expression for a Clinician as the audience, noting that it doesn't provide the same level of detail, is:

{
  "items": [
    {
      "type": ["http://schema.org/Movie"],
      "properties": {
        "audience": ["http://schema.org/Clinician"]
      }
    }
  ]
}


I'm just trying to figure out how to define the JSON Schema portion for this seemingly unusual condition.

After tinkering a bit... does this seem the right way for this?


    "properties": {
      "type": {
        "items": {
          "enum": ["http://schema.org/MedicalAudience"]
        },
        "type": "array"
      },
      "id": {
        "$ref": "#definitions/Text"
      },
      "properties": {
        "type": "object"

Geraint Luff

unread,
Mar 30, 2013, 2:23:44 PM3/30/13
to json-...@googlegroups.com
I admit I might have completely mis-understood, but it looks to me like you might want "oneOf" instead of "enum" in that final example.  Something like:

{
  "oneOf": [
    {"$ref": "http://schema.org/Clinician"},
    {"$ref": "http://schema.org/Patient"},
    {"$ref":"http://schema.org/Researcher"}, {
      "properties": {
        "type": {
          "items": {
            "enum": ["http://schema.org/MedicalAudience"]
          },
          "type": "array"
        },
        "id": {
          "$ref": "#definitions/Text"
        },
        ...
      }
    }   
  ],
  "description": "Target audiences for medical web pages. Enumerated type.",
  "title": "Medical Audience"
}

The values in "enum" are *literal*, not schemas or references to schemas.

Jim Klo

unread,
Mar 30, 2013, 3:24:59 PM3/30/13
to <json-schema@googlegroups.com>, json-...@googlegroups.com
Response below. 
I was debating that myself. And after more research into Schema.org instances - they seem to be just an ordinary enum, no weird properties. However http://schema.org/Clinicianhttp://schema.org/Patient, and http://schema.org/Researcher are valid literals (which are strongly typed by domain)

I think I just need to treat MedicalAudience (and others like it) as enumerated types. 

IE http://schema.org/Boolean is the same category.  It can be http://schema.org/True or http://schema.org/False, as well as a few other allowed expressions. The literals for this would be: 

{
 "enum": [ "http://schema.org/True",  "http://schema.org/False", "true", "false", true, false ]
}

The question then becomes can an enum be comprised of heterogeneous types - or do I just need to use oneOf?

Thanks,

Geraint Luff

unread,
Mar 30, 2013, 3:54:18 PM3/30/13
to json-...@googlegroups.com
An enum can be heterogenous types - there's no problem with that.  But your example had an "enum" at the root level with four values in it.

One of the values was a schema, but that is irrelevant as far as the "enum" keyword is concerned.  As far as "enum" is concerned, the valid values are now "http://schema.org/Clinician", "http://schema.org/Patient", "http://schema.org/Researcher" and the schema itself.

I suspect that instead of wanting the schema itself as a value, you want valid values to include data that follows the schema, and for that you need "oneOf", not "enum".

Does that make sense?

Thomas Hoppe

unread,
Aug 26, 2013, 4:10:26 PM8/26/13
to json-...@googlegroups.com
@Jim Klo

Did you publish your efforts yet?

I tried around a little bit by myself to create 1:1 reflections but I think you end up with very complex schemas
and that's why I took the approach to cherry pick only what I found useful.

Jim Klo

unread,
Aug 26, 2013, 5:07:57 PM8/26/13
to <json-schema@googlegroups.com>
I did… however we've been in discussion around revamping to use JSON-LD as the serialization of the data model as opposed to the Microdata -> JSON transform outlined in the HTML Microdata specification.

Ultimately I did a similar thing by cherry-picking - however moving to JSON-LD would simplify things a bunch.

Additionally the HTML Microdata transform spec changed a bit, so it's less complex, but not by much. It no longer seems to support type unions as it once did.

I plan on working on this revamp around the beginning of October.

- JK

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International
t. @nsomnac

On Aug 26, 2013, at 1:10 PM, Thomas Hoppe <thomas...@n-fuse.de>
 wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "JSON Schema" group.

Jim Klo

unread,
Aug 26, 2013, 5:12:00 PM8/26/13
to <json-schema@googlegroups.com>
On Aug 26, 2013, at 1:10 PM, Thomas Hoppe <thomas...@n-fuse.de>
 wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "JSON Schema" group.

Pavlik elf

unread,
Sep 25, 2013, 11:04:18 AM9/25/13
to json-...@googlegroups.com

On Monday, August 26, 2013 11:07:57 PM UTC+2, Jim Klo wrote:
I did… however we've been in discussion around revamping to use JSON-LD as the serialization of the data model as opposed to the Microdata -> JSON transform outlined in the HTML Microdata specification.

Ultimately I did a similar thing by cherry-picking - however moving to JSON-LD would simplify things a bunch.
not sure if you noticed that schema.org adopts use of JSON-LD http://blog.schema.org/2013/06/schemaorg-and-json-ld.html

Jim Klo

unread,
Sep 25, 2013, 12:48:25 PM9/25/13
to <json-schema@googlegroups.com>
On Sep 25, 2013, at 8:04 AM, Pavlik elf <perpetua...@wwelves.org> wrote:


On Monday, August 26, 2013 11:07:57 PM UTC+2, Jim Klo wrote:
I did… however we've been in discussion around revamping to use JSON-LD as the serialization of the data model as opposed to the Microdata -> JSON transform outlined in the HTML Microdata specification.

Ultimately I did a similar thing by cherry-picking - however moving to JSON-LD would simplify things a bunch.
not sure if you noticed that schema.org adopts use of JSON-LD http://blog.schema.org/2013/06/schemaorg-and-json-ld.html


Yes, I'm aware of this - this adoption happened later than when my work began… In any case - there are a few of us on the Schema.org, LRMI, and A11Y groups that have been discussing the construction of a JSON schema for JSON-LD formatted content using the respective vocabularies.  It's unclear how well JSON Schema will hold up a schema for JSON-LD given the variety of shortcut expressions and extension that can be done to the @context.


Thomas Hoppe

unread,
Sep 26, 2013, 3:51:05 AM9/26/13
to json-...@googlegroups.com
Thanks for sharing this Jim Klo.

This discussion is also somewhat related to the question how JSON Schema and JSON-LD
can relate to each other:

http://lists.w3.org/Archives/Public/public-linked-json/2013Aug/0046.html

Geraint

unread,
Sep 26, 2013, 6:49:35 AM9/26/13
to json-...@googlegroups.com
On Wednesday, September 25, 2013 5:48:25 PM UTC+1, Jim Klo wrote:

Yes, I'm aware of this - this adoption happened later than when my work began… In any case - there are a few of us on the Schema.org, LRMI, and A11Y groups that have been discussing the construction of a JSON schema for JSON-LD formatted content using the respective vocabularies.  It's unclear how well JSON Schema will hold up a schema for JSON-LD given the variety of shortcut expressions and extension that can be done to the @context.


Sounds like I should maybe get myself to those groups - this is something I've been mulling over for a while myself.  Could you point me to a few of these discussions?

Geraint

Jim Klo

unread,
Sep 26, 2013, 12:01:13 PM9/26/13
to <json-schema@googlegroups.com>

On Sep 26, 2013, at 3:49 AM, Geraint <gerai...@gmail.com>
 wrote:

On Wednesday, September 25, 2013 5:48:25 PM UTC+1, Jim Klo wrote:

Yes, I'm aware of this - this adoption happened later than when my work began… In any case - there are a few of us on the Schema.org, LRMI, and A11Y groups that have been discussing the construction of a JSON schema for JSON-LD formatted content using the respective vocabularies.  It's unclear how well JSON Schema will hold up a schema for JSON-LD given the variety of shortcut expressions and extension that can be done to the @context.


Sounds like I should maybe get myself to those groups - this is something I've been mulling over for a while myself.  Could you point me to a few of these discussions?


For LRMI the main discussion has been here: http://goo.gl/id06Bh, A11y metadata project discussion is here: http://goo.gl/12T4Kq. However much of the JSON Schema discussion has happened off list between myself and key members of LRMI, A11y Metadata Project, and Schema.org. Jason Hoekstra at inBloom has been working on documenting most of the work. He and I are planning to tag team an attempt at a JSON Schema to be completed by end of year.

Like I mentioned, I think the real question on how to marry JSON-LD with JSON Schema is around use of @context.

{
"@context": {
"@vocab": "http://schema.org/"
},
"name": "Object Name"
}

and 

{
"http://schema.org/name": "Object Name"
}

are equivalent expressions (and I'm sure there's even 3 - 4 more variations as well) - I'm not sure how to make a JSON Schema without it getting really messy.

It seems that in order to make effective use of JSON Schema we either need to settle on a canonical JSON-LD form to be used when validating against a schema or settle on a single expression of the context that the schema will validate.

- JK


Geraint

unread,
Sep 26, 2013, 4:44:33 PM9/26/13
to json-...@googlegroups.com
On Thursday, September 26, 2013 5:01:13 PM UTC+1, Jim Klo wrote:

For LRMI the main discussion has been here: http://goo.gl/id06Bh, A11y metadata project discussion is here: http://goo.gl/12T4Kq. However much of the JSON Schema discussion has happened off list between myself and key members of LRMI, A11y Metadata Project, and Schema.org. Jason Hoekstra at inBloom has been working on documenting most of the work. He and I are planning to tag team an attempt at a JSON Schema to be completed by end of year.

Well, do drop me a line if there's anything I can do to help. 

Like I mentioned, I think the real question on how to marry JSON-LD with JSON Schema is around use of @context.

{
"@context": {
"@vocab": "http://schema.org/"
},
"name": "Object Name"
}

and 

{
"http://schema.org/name": "Object Name"
}

are equivalent expressions (and I'm sure there's even 3 - 4 more variations as well) - I'm not sure how to make a JSON Schema without it getting really messy.

Yeah - I think the issue here is what you're actually trying to validate.

JSON Schema validation is entirely concerned with JSON structure/representation, and those two representations are quite fundamentally different.  You could probably hack together something that validated this particular case (i.e. basically writing two schemas, depending on whether @vocab is set), but add in the ability to define arbitrary Curie prefixes, and it becomes unmanageable.

Basically, you have two domains - the world of JSON, and the world of parsed JSON-LD (RDF).  The mapping from one to another is many-to-one, and you're attempting to place constraints on the wrong side of that many-to-one mapping.

It seems that in order to make effective use of JSON Schema we either need to settle on a canonical JSON-LD form to be used when validating against a schema or settle on a single expression of the context that the schema will validate.

Yes, that does make sense - if you can make the mapping from RDF-world back to JSON-world one-to-one, then validation with JSON Schema becomes feasible.  I mean, even if you just said "Before validating, replace all properties with their full link relation URIs", then things suddenly get a whole lot more tractable.  Add in something like "always use arrays", and things get even simpler.

Geraint

unread,
Sep 26, 2013, 4:51:39 PM9/26/13
to json-...@googlegroups.com
I'm going to side-track a little here, about how I (originally) imagined JSON-LD and JSON Schema working together.  This is not necessarily the way things should be, just documenting where I'm coming from, so you know exactly which assumptions I'll need to be disillusioned of.  :p

For me, the most interesting part of JSON-LD was never the flexibility of what it could describe - it was describing an interpretation of existing data formats and APIs (the kind that have a fixed, concise, JSON-Schema-verifiable representation) as this exciting abstract Linked Data graph.

However, an API that reserves the right to pump out arbitrary (but equivalent) JSON-LD representations instead is only going to be reliably understood by a JSON-LD client.  I mean, I do understand that JSON-LD is actually just using JSON as a transport layer for RDF, but I was more captured by the idea of making existing JSON APIs work as a cool dual-purpose format.

So I guess what I'm saying is that when I thought about using JSON Schema and JSON-LD in the same API, I was thinking that contexts and schemas would come in pairs - and the schema would look like this:
{
    "type": "object",
    "properties": {
        "@context": {"enum": ["http://example.com/contexts/article.jsonld"]},
        ...
    }
}
So JSON Schema describes the structure, and the referenced JSON-LD context enables RDF interpretation of the data.  (In fact, if "propertyLinks" makes it into v5, then it might be possible to do something like generate the JSON-LD context from the schema, or whatever, but that's beside the point.)  The @context and the schema would act as a pair, documenting complementary aspects of the data.

That may or may not be what people on those lists are hankering for, but coming from a JSON Schema perspective, that's the aspect I find the most interesting.

ph...@civicagency.org

unread,
Feb 3, 2014, 12:17:27 PM2/3/14
to json-...@googlegroups.com
Jim, I came across https://github.com/jimklo/schema-dot-org-json-schema-generator but I'm wondering if there's been more recent work from the effort you mentioned with Jason Hoekstra by the end of 2013

Geraint

unread,
Feb 4, 2014, 10:09:21 AM2/4/14
to json-...@googlegroups.com, ph...@civicagency.org
I also put something together at: https://github.com/geraintluff/schema-org-gen, which I've been using for some projects here and there.

Geraint

Jim Klo

unread,
Feb 4, 2014, 11:48:33 AM2/4/14
to json-...@googlegroups.com
Hi Phil,

So yes but it's not as thorough as I might have like due to my lack of time.  I used a very directed solution for my use case here within this Jasmine test spec: https://github.com/easypublish/EasyPublish/blob/master/js/tests/jsonld_spec.js


Unfortunately I ran out of time to update the schema generator.  Given what we have documented at this point, I think it should be trivial to update generator, but alas - I'm no longer funded to work on that specific project.  So if anyone cares to pick up the torch, I'm happy to do so.

Best,

- JK

Essentially the conclusion was that the only reasonable way to deal with this
Reply all
Reply to author
Forward
0 new messages