We are considering building a new feature for
data.nytimes.com, and
would like to solicit feedback on what this community thinks is the
best way forward.
Right now, you can get any of the resources on this site as HTML or
RDF/XML. HTML is good for humans, RDF/XML is good for reasoners/
triplestores/etc. However, neither format is particularly well-suited
to client-side development, since processing XML (especially RDF/XML)
is a somewhat onerous task in JavaScript. For this reason, we think
it would be helpful to provide the same data in JSON.
The question then arises, how should you serialize RDF as JSON. One
approach is to serialize the output of a specialized SPARQL query as
JSON (
http://dowhatimean.net/2006/05/rdfjson), but I find that the
JSON output by this approach is a bit awkward.
Another approach is to use the RDF/JSON specification developed by
Talis and described at
http://n2.talis.com/wiki/RDF_JSON_Specification
. If we were to use this approach, then the JSON generated for our
resource for "Colbert, Stephen" (
http://data.nytimes.com\/
N66220017142656459133), would look like this:
{
"http:\/\/
data.nytimes.com\/N66220017142656459133" : {
"http:\/\/
www.w3.org\/2004\/02\/skos\/core#prefLabel" :
[ { "value" : "Colbert, Stephen", "type" : "literal", "lang" :
"en" } ],
"http:\/\/
data.nytimes.com\/elements\/associated_article_count" :
[ { "value" : "46", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#int" } ],
"http:\/\/
www.w3.org\/2004\/02\/skos\/core#inScheme" :
[ { "value" : "http:\/\/
data.nytimes.com\/elements\/nytd_per",
"type" : "uri" } ],
"http:\/\/
data.nytimes.com\/elements\/first_use" : [ { "value" :
"2002-12-10", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/
2001\/XMLSchema#date" } ],
"http:\/\/
data.nytimes.com\/elements\/number_of_variants" :
[ { "value" : "1", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#int" } ],
"http:\/\/
data.nytimes.com\/elements\/search_api_query" :
[ { "value" : "http:\/\/
api.nytimes.com\/...", "type" : "literal",
"datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/
www.w3.org\/1999\/02\/22-rdf-syntax-ns#type" :
[ { "value" : "http:\/\/
www.w3.org\/2004\/02\/skos\/core#Concept",
"type" : "uri" } ],
"http:\/\/
data.nytimes.com\/elements\/topicPage" : [ { "value" :
"http:\/\/
topics.nytimes.com\...", "type" : "uri" } ],
"http:\/\/
www.w3.org\/2002\/07\/owl#sameAs" : [
{ "value" : "http:\/\/
dbpedia.org\/resource\/Stephen_Colbert",
"type" : "uri" },
{ "value" : "http:\/\/
rdf.freebase.com\/ns\/en.stephen_colbert",
"type" : "uri" },
{ "value" : "http:\/\/
data.nytimes.com\/colbert_stephen_per",
"type" : "uri" }
],
"http:\/\/
www.w3.org\/2004\/02\/skos\/core#definition" :
[ { "value" : "Stephen Colbert is the host of Comedy...", "type" :
"literal", "lang" : "en" } ],
"http:\/\/
data.nytimes.com\/elements\/latest_use" : [ { "value" :
"2009-08-26", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/
2001\/XMLSchema#date" } ]
},
"http:\/\/
data.nytimes.com\/N66220017142656459133.rdf" : {
"http:\/\/
purl.org\/dc\/elements\/1.1\/creator" : [ { "value" :
"The New York Times Company", "type" : "literal", "datatype" : "http:\/
\/
www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/
xmlns.com\/foaf\/0.1\/primaryTopic" : [ { "value" :
"http:\/\/
data.nytimes.com\/N66220017142656459133", "type" :
"uri" } ],
"http:\/\/
creativecommons.org\/ns#license" : [ { "value" : "http:\/
\/
creativecommons.org\/licenses\/by\/3.0\/us\/", "type" : "uri" } ],
"http:\/\/
creativecommons.org\/ns#attributionURL" : [ { "value" :
"http:\/\/
data.nytimes.com\/N66220017142656459133", "type" :
"uri" } ],
"http:\/\/
purl.org\/dc\/terms\/rightsHolder" : [ { "value" : "The
New York Times Company", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/
creativecommons.org\/ns#attributionName" : [ { "value" :
"The New York Times Company", "type" : "literal", "datatype" : "http:\/
\/
www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/
purl.org\/dc\/terms\/modified" : [ { "value" :
"2009-11-11", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/
2001\/XMLSchema#date" } ]
}
}
I find this approach more readable than the SPARQL-based approach, but
- since the namespaces are not abbreviated - I still think that this
JSON object would be a bit awkward for certain kinds of client-side
development. As such, I'm also considering a variant on the Talis
approch to RDF/JSON that abbreviates the namespaces and drops type
information. I feel that doing this increases the simplicity and
readability of the JSON object. Using this approach, the JSON
generated for our resource for "Colbert, Stephen" (http://
data.nytimes.com\/N66220017142656459133), would look like this:
{
"namepace" : {
"cc" : "http:\/\/
creativecommons.org\/ns#",
"dcterms" : "http:\/\/
purl.org\/dc\/terms\/",
"time" : "http:\/\/
www.w3.org\/2006\/time#",
"dc" : "http:\/\/
purl.org\/dc\/elements\/1.1\/",
"nyt" : "http:\/\/
data.nytimes.com\/elements\/",
"rdf" : "http:\/\/
www.w3.org\/1999\/02\/22-rdf-syntax-ns#",
"foaf" : "http:\/\/
xmlns.com\/foaf\/0.1\/",
"skos" : "http:\/\/
www.w3.org\/2004\/02\/skos\/core#",
"owl" : "http:\/\/
www.w3.org\/2002\/07\/owl#"
},
"http:\/\/
data.nytimes.com\/N66220017142656459133" : {
"skos:prefLabel" : [ "Colbert, Stephen" ],
"nyt:associated_article_count" : [ "46" ],
"skos:inScheme" : [ "http:\/\/
data.nytimes.com\/elements\/
nytd_per" ],
"nyt:first_use" : [ "2002-12-10" ],
"nyt:number_of_variants" : [ "1" ],
"nyt:search_api_query" : [ "http:\/\/api.nytimes.com...],
"nyt:topicPage" : [ "http:\/\/
topics.nytimes.com\/top\/reference
\/..." ],
"owl:sameAs" : [
"http:\/\/
dbpedia.org\/resource\/Stephen_Colbert",
"http:\/\/
rdf.freebase.com\/ns\/en.stephen_colbert",
"http:\/\/
data.nytimes.com\/colbert_stephen_per"
],
"skos:definition" : [ "nStephen Colbert is the host of
Comedy..." ],
"nyt:latest_use" : [ "2009-08-26" ]
},
"http:\/\/
data.nytimes.com\/N66220017142656459133.rdf" : {
"dc:creator" : [ "The New York Times Company" ],
"foaf:primaryTopic" : [ "http:\/\/
data.nytimes.com\/
N66220017142656459133" ],
"cc:license" : [ "http:\/\/
creativecommons.org\/licenses\/by\/3.0\/
us\/" ],
"cc:attributionURL" : [ "http:\/\/
data.nytimes.com\/
N66220017142656459133" ],
"dcterms:rightsHolder" : [ "The New York Times Company" ],
"cc:attributionName" : [ "The New York Times Company" ],
"dcterms:modified" : [ "2009-11-11" ]
}
}
So what do you think? Should we provide JSON objects in the Talis
format, in the abbreviated format, or in both formats? I'm leaning
towards providing both. Thanks for your feedback.
Evan
--
Evan Sandhaus
Semantic Technologist
New York Times Research + Development
p.s. Thank you to Ian Davis and Sam Tunnicliffe of Talis for their
support in my evaluation of approaches to RDF/JSON serialization.