RDF in JSON on http://data.nytimes.com

16 views
Skip to first unread message

Evan Sandhaus

unread,
Nov 20, 2009, 10:51:24 AM11/20/09
to The New York Times Linked Open Data Community
We are considering building a new feature for data.nytimes.com, and
would like to solicit feedback on what this community thinks is the
best way forward.

Right now, you can get any of the resources on this site as HTML or
RDF/XML. HTML is good for humans, RDF/XML is good for reasoners/
triplestores/etc. However, neither format is particularly well-suited
to client-side development, since processing XML (especially RDF/XML)
is a somewhat onerous task in JavaScript. For this reason, we think
it would be helpful to provide the same data in JSON.

The question then arises, how should you serialize RDF as JSON. One
approach is to serialize the output of a specialized SPARQL query as
JSON (http://dowhatimean.net/2006/05/rdfjson), but I find that the
JSON output by this approach is a bit awkward.

Another approach is to use the RDF/JSON specification developed by
Talis and described at http://n2.talis.com/wiki/RDF_JSON_Specification
. If we were to use this approach, then the JSON generated for our
resource for "Colbert, Stephen" (http://data.nytimes.com\/
N66220017142656459133), would look like this:

{
"http:\/\/data.nytimes.com\/N66220017142656459133" : {
"http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel" :
[ { "value" : "Colbert, Stephen", "type" : "literal", "lang" :
"en" } ],
"http:\/\/data.nytimes.com\/elements\/associated_article_count" :
[ { "value" : "46", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#int" } ],
"http:\/\/www.w3.org\/2004\/02\/skos\/core#inScheme" :
[ { "value" : "http:\/\/data.nytimes.com\/elements\/nytd_per",
"type" : "uri" } ],
"http:\/\/data.nytimes.com\/elements\/first_use" : [ { "value" :
"2002-12-10", "type" : "literal", "datatype" : "http:\/\/www.w3.org\/
2001\/XMLSchema#date" } ],
"http:\/\/data.nytimes.com\/elements\/number_of_variants" :
[ { "value" : "1", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#int" } ],
"http:\/\/data.nytimes.com\/elements\/search_api_query" :
[ { "value" : "http:\/\/api.nytimes.com\/...", "type" : "literal",
"datatype" : "http:\/\/www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type" :
[ { "value" : "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept",
"type" : "uri" } ],
"http:\/\/data.nytimes.com\/elements\/topicPage" : [ { "value" :
"http:\/\/topics.nytimes.com\...", "type" : "uri" } ],
"http:\/\/www.w3.org\/2002\/07\/owl#sameAs" : [
{ "value" : "http:\/\/dbpedia.org\/resource\/Stephen_Colbert",
"type" : "uri" },
{ "value" : "http:\/\/rdf.freebase.com\/ns\/en.stephen_colbert",
"type" : "uri" },
{ "value" : "http:\/\/data.nytimes.com\/colbert_stephen_per",
"type" : "uri" }
],
"http:\/\/www.w3.org\/2004\/02\/skos\/core#definition" :
[ { "value" : "Stephen Colbert is the host of Comedy...", "type" :
"literal", "lang" : "en" } ],
"http:\/\/data.nytimes.com\/elements\/latest_use" : [ { "value" :
"2009-08-26", "type" : "literal", "datatype" : "http:\/\/www.w3.org\/
2001\/XMLSchema#date" } ]
},
"http:\/\/data.nytimes.com\/N66220017142656459133.rdf" : {
"http:\/\/purl.org\/dc\/elements\/1.1\/creator" : [ { "value" :
"The New York Times Company", "type" : "literal", "datatype" : "http:\/
\/www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/xmlns.com\/foaf\/0.1\/primaryTopic" : [ { "value" :
"http:\/\/data.nytimes.com\/N66220017142656459133", "type" :
"uri" } ],
"http:\/\/creativecommons.org\/ns#license" : [ { "value" : "http:\/
\/creativecommons.org\/licenses\/by\/3.0\/us\/", "type" : "uri" } ],
"http:\/\/creativecommons.org\/ns#attributionURL" : [ { "value" :
"http:\/\/data.nytimes.com\/N66220017142656459133", "type" :
"uri" } ],
"http:\/\/purl.org\/dc\/terms\/rightsHolder" : [ { "value" : "The
New York Times Company", "type" : "literal", "datatype" : "http:\/\/
www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/creativecommons.org\/ns#attributionName" : [ { "value" :
"The New York Times Company", "type" : "literal", "datatype" : "http:\/
\/www.w3.org\/2001\/XMLSchema#string" } ],
"http:\/\/purl.org\/dc\/terms\/modified" : [ { "value" :
"2009-11-11", "type" : "literal", "datatype" : "http:\/\/www.w3.org\/
2001\/XMLSchema#date" } ]
}
}

I find this approach more readable than the SPARQL-based approach, but
- since the namespaces are not abbreviated - I still think that this
JSON object would be a bit awkward for certain kinds of client-side
development. As such, I'm also considering a variant on the Talis
approch to RDF/JSON that abbreviates the namespaces and drops type
information. I feel that doing this increases the simplicity and
readability of the JSON object. Using this approach, the JSON
generated for our resource for "Colbert, Stephen" (http://
data.nytimes.com\/N66220017142656459133), would look like this:

{
"namepace" : {
"cc" : "http:\/\/creativecommons.org\/ns#",
"dcterms" : "http:\/\/purl.org\/dc\/terms\/",
"time" : "http:\/\/www.w3.org\/2006\/time#",
"dc" : "http:\/\/purl.org\/dc\/elements\/1.1\/",
"nyt" : "http:\/\/data.nytimes.com\/elements\/",
"rdf" : "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#",
"foaf" : "http:\/\/xmlns.com\/foaf\/0.1\/",
"skos" : "http:\/\/www.w3.org\/2004\/02\/skos\/core#",
"owl" : "http:\/\/www.w3.org\/2002\/07\/owl#"
},
"http:\/\/data.nytimes.com\/N66220017142656459133" : {
"skos:prefLabel" : [ "Colbert, Stephen" ],
"nyt:associated_article_count" : [ "46" ],
"skos:inScheme" : [ "http:\/\/data.nytimes.com\/elements\/
nytd_per" ],
"nyt:first_use" : [ "2002-12-10" ],
"nyt:number_of_variants" : [ "1" ],
"nyt:search_api_query" : [ "http:\/\/api.nytimes.com...],
"nyt:topicPage" : [ "http:\/\/topics.nytimes.com\/top\/reference
\/..." ],
"owl:sameAs" : [
"http:\/\/dbpedia.org\/resource\/Stephen_Colbert",
"http:\/\/rdf.freebase.com\/ns\/en.stephen_colbert",
"http:\/\/data.nytimes.com\/colbert_stephen_per"
],
"skos:definition" : [ "nStephen Colbert is the host of
Comedy..." ],
"nyt:latest_use" : [ "2009-08-26" ]
},
"http:\/\/data.nytimes.com\/N66220017142656459133.rdf" : {
"dc:creator" : [ "The New York Times Company" ],
"foaf:primaryTopic" : [ "http:\/\/data.nytimes.com\/
N66220017142656459133" ],
"cc:license" : [ "http:\/\/creativecommons.org\/licenses\/by\/3.0\/
us\/" ],
"cc:attributionURL" : [ "http:\/\/data.nytimes.com\/
N66220017142656459133" ],
"dcterms:rightsHolder" : [ "The New York Times Company" ],
"cc:attributionName" : [ "The New York Times Company" ],
"dcterms:modified" : [ "2009-11-11" ]
}
}

So what do you think? Should we provide JSON objects in the Talis
format, in the abbreviated format, or in both formats? I'm leaning
towards providing both. Thanks for your feedback.

Evan
--
Evan Sandhaus
Semantic Technologist
New York Times Research + Development

p.s. Thank you to Ian Davis and Sam Tunnicliffe of Talis for their
support in my evaluation of approaches to RDF/JSON serialization.

Pius Uzamere

unread,
Nov 20, 2009, 10:58:10 AM11/20/09
to nyt_linked...@googlegroups.com
The latter representation is actually exactly what I've been using while experimenting with your data.  So, +1 to providing a Talis RDF/JSON serialization of the data with CURIEs.

-Pius

Kingsley Idehen

unread,
Nov 20, 2009, 11:09:56 AM11/20/09
to nyt_linked...@googlegroups.com
Evan Sandhaus wrote:
> We are considering building a new feature for data.nytimes.com, and
> would like to solicit feedback on what this community thinks is the
> best way forward.
>
> Right now, you can get any of the resources on this site as HTML or
> RDF/XML. HTML is good for humans, RDF/XML is good for reasoners/
> triplestores/etc. However, neither format is particularly well-suited
> to client-side development, since processing XML (especially RDF/XML)
> is a somewhat onerous task in JavaScript. For this reason, we think
> it would be helpful to provide the same data in JSON.
>
Yes, but at the heart of Linked Data publishing, based on its HTTP core,
lies the ability to negotiate data representations. Thus, why not let
user agents negotiate data representations? Basically, the following
should be negotiable:

1. HTML+RDFa
2. JSON
3. N3/Turtle
4. RDF/XML

> The question then arises, how should you serialize RDF as JSON.
You can use technology that just does handles the different data
representations, this is basically what Linked Data servers [1][2] are
about. This is how DBpedia works etc.
The most important item for you is location of the data you are
publishing i.e, where is it stored. Once the data location is clear, its
all about using a Linked Data Server if you don't want to burden
yourself and a data representation request basis, which will ultimately
be the case.

As stated above, take as many cues as possible from the DBpedia project,
at the end of the day we built this project to showcase how Linked Data
should be deployed on the Web :-)



Links:

1. http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ --
Linked Data Deployment Tutorial that uses Pubby as the Linked Data
Server example
2.
http://virtuoso.openlinksw.com/Whitepapers/html/vdld_html/VirtDeployingLinkedDataGuide.html
-- Linked Data Deployment using Virtuoso's Linked Data Server
3. http://dbpedia.org/resource/Linked_Data -- look at the page footer
ditto the <head/> section via view source mode and of course the RDFa,
its all there.
> Evan
> --
> Evan Sandhaus
> Semantic Technologist
> New York Times Research + Development
>
> p.s. Thank you to Ian Davis and Sam Tunnicliffe of Talis for their
> support in my evaluation of approaches to RDF/JSON serialization.
>
>


--


Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com




Matt M. Kaufman

unread,
Nov 20, 2009, 3:57:59 PM11/20/09
to The New York Times Linked Open Data Community
This is a good post. It is a good thing to see that this level of
thought and discovery process is being pursued by your team for your
data API's.....

1) I like the fact that they are not hyped or massively deployed

2) They are purely about the actual data and API access itself

3) Pretty open and formless development it seems

4) Not overflooded by idiots or rampant client developers

5) This message board is amazing... There's like... 8 Topics! That is
excellent and focused.

6) I would like to get involved more, are you guys developing this out
of Manhattan?

--->

Oh, my core point is: Thanks for *thinking* out of the ....
"box" (quite, actually a in the box type term, hehe)! The "standard"
"standards" out there for API data publishing/retrieval or schema and
definition is VERY lacking and to the point of a cluster-*uck....

Formats like RDF/JSON/Blah.... are .... just that. There are better
ways and it's NOT about the *format* of the way the data is
published....

I think the correct methods rely on something combining a globally
applicable and compliant graphical language-context mapping type
structure system .... for the access points (ie, urls or function
calls) and also the data that is returned it self. It should also
support auto-discovery like jabber/xmpp servers or protocol does.
EG., auto discovery of urls and the content that is returned itself?

Oh. Speaking of this, has anybody thought of using Sphinx or Solr to
auto-classify and map unstructured data into a structure? I haven't
gotten to playing with this yet.

And last thing: Sorry if parts of this post are not fitting with the
actual topic here. Anyway, if you guys are in Manhattan, I'd love to
speak with one or a couple of you for coffee or lunch or even on the
phone quick to get a feel of what you're working on and where it's
being pointed to (or your direction, I mean)... so far, (the last
couple of months or more, eg., around 6) --- Have increased my
suspection that you're actually working on something that is going to
be quite worth it.....

Anyway,

Thanks,
Matt Kaufman...

Hopefully this might spark some discussion and ignite my involvement
within it to contribute some

glenn mcdonald

unread,
Nov 20, 2009, 10:20:12 PM11/20/09
to The New York Times Linked Open Data Community
Your second JSON format is much nicer. Basically a JSON-ization of N3.

And to combine two discussions, the data-model change I'm suggesting
you make is to replace your "owl:sameAs" with "nyt:about", so where
you have this:

"owl:sameAs" : [
"http:\/\/dbpedia.org\/resource\/Stephen_Colbert",
"http:\/\/rdf.freebase.com\/ns\/en.stephen_colbert",
"http:\/\/data.nytimes.com\/colbert_stephen_per"
],

which would produce an unusable mess for anybody who tried to combine
your data with another similarly-modeled set from another newspaper,
you'd instead have this:

"nyt:about" : [
"http:\/\/dbpedia.org\/resource\/Stephen_Colbert",
"http:\/\/rdf.freebase.com\/ns\/en.stephen_colbert",
"http:\/\/data.nytimes.com\/colbert_stephen_per"
],

or, to make this even better as an example for other papers to follow,
go ahead and put in the implicit self:

"nyt:publication" : [ "New York Times" ],
"nyt:about" : [
"http:\/\/dbpedia.org\/resource\/Stephen_Colbert",
"http:\/\/rdf.freebase.com\/ns\/en.stephen_colbert",
"http:\/\/data.nytimes.com\/colbert_stephen_per"
],

Trivial change, syntactically, but huge win, semantically. Now any
number of newspaper datasets modeled like this can be combined, as is,
without any messing around with entailment or predicate remapping. The
future will thank you, over and over again!

glenn mcdonald

unread,
Nov 20, 2009, 10:27:56 PM11/20/09
to The New York Times Linked Open Data Community
Oh, and you don't need to escape slashes in JSON. So this all could be
even more readable:

"nyt:publication" : [ "New York Times" ],
"nyt:about" : [
      "http://dbpedia.org/resource/Stephen_Colbert",
      "http://rdf.freebase.com/ns/en.stephen_colbert",
      "http://data.nytimes.com/colbert_stephen_per"
    ],

Keith Alexander

unread,
Nov 21, 2009, 6:52:00 AM11/21/09
to nyt_linked...@googlegroups.com
Hi,

On Sat, Nov 21, 2009 at 3:20 AM, glenn mcdonald <gmcd...@furia.com> wrote:
> Your second JSON format is much nicer. Basically a JSON-ization of N3.
>

I was involved in the design of the 'Talis' RDF/JSON (the
specification is published on our wiki, but was developed in
consultation with other interested tool-developers etc it is
implemented by various Talis & non-Talis toolkits and webservices).

An early iteration of the design was pretty much the same as the
proposal with prefixes. It does look nicer, and seems like a good idea
on the face of it. But after writing code using it for a few weeks, I
came to the conclusion that it would be better to have full URIs
without. Uglier looking data-serialisation, but prettier looking
code.

What we came up with isn't necessarily the simplest, prettiest JSON
structure - if you don't need language tags or datatypes, if you don't
want to merge JSON objects from two or more graphs, you can come up
with something simpler.

What I found with the namespaces declaration is you need to ignore it
when iterating over the resources (more code), and, unless you have a
closed loop, where you know for sure in advance what namespaces will
be bound to what prefixes, you can't just write:

var title = rdf[uri]["dct:title"][0] ;

incase the prefix is actually dcterms: or terms: or ns0: or whatever.
So you'd have to iterate over the namespaces object looking to find
the prefix used in this document. More code. And it's not so bad to
just write:

DCT = 'http://purl.org/dc/terms/';
var title = rdf[uri][DCT+'title'][0]['value'];

Similarly, when merging graphs, it becomes much more complicated
unless the namespaces used are definitely exactly the same in both
graphs.

That's why we came to the conclusion that it was better to do away
with the namespaces part anyway. You might come to different
conclusions if your use cases are different (ie, users can rely on
your prefixes always being the same, are writing code specifically for
your data and data format, and don't want to do merges etc - in which
case, you could maybe do away with including the namespaces object
altogether and document it in a separate document?).

In any case, great to see your data as RDF, however it is serialised!

Keith Alexander

Evan Sandhaus

unread,
Nov 21, 2009, 11:53:41 AM11/21/09
to The New York Times Linked Open Data Community
Keith,

Thank you so much for weighing in on this issue, and my apologies for
characterizing this RDF/JSON convention as purely a Talis effort.

I agree that the abbreviated JavaScript notation is inferior to the
full version, if the intention of the developer is to merge graphs.
If however, the intention is simply to get a sameAs reference or two
and generate 3rd party API requests based on these references, then I
think the abbreviated syntax makes life a bit easier. Especially
since, as you suggest, the namespace prefixes for our data are
relatively stable.

That being said, I see multiple use-cases for the JSON data, which is
why I think we'll likely go ahead and publish two different JSON
serializations: the full and the abberviated.

So if you request: http://data.nytimes.com/N66220017142656459133.json,
you'd get the full version and if you request
http://data.nytimes.com/N66220017142656459133_min.json you'd get the
abbreviated version.

Does this sound reasonable to you?

All the best,

Evan

On Nov 21, 6:52 am, Keith Alexander <k.j.w.alexan...@gmail.com> wrote:
> Hi,
>

Keith Alexander

unread,
Nov 22, 2009, 1:23:27 PM11/22/09
to nyt_linked...@googlegroups.com
Hi Evan,

On Sat, Nov 21, 2009 at 4:53 PM, Evan Sandhaus <kan...@gmail.com> wrote:
> Keith,
>
> Thank you so much for weighing in on this issue, and my apologies for
> characterizing this RDF/JSON convention as purely a Talis effort.
>
Please don't apologise, I mean, it does originate from the Talis n2
wiki; I was just trying to clarify that it isn't only Talis that use
it, and we developed it hoping it could be a structure people would
converge on, which in general, I think they have (which is nice).
(That's not to say you shouldn't use something else as well/instead if
that better suits your requirements).

> I agree that the abbreviated JavaScript notation is inferior to the
> full version, if the intention of the developer is to merge graphs.
> If however, the intention is simply to get a sameAs reference or two
> and generate 3rd party API requests based on these references,  then I
> think the abbreviated syntax makes life a bit easier.  Especially
> since, as you suggest, the namespace prefixes for our data are
> relatively stable.
>
> That being said, I see multiple use-cases for the JSON data, which is
> why I think we'll likely go ahead and publish two different JSON
> serializations: the full and the abberviated.
>
> So if you request: http://data.nytimes.com/N66220017142656459133.json,
> you'd get the full version and if you request
> http://data.nytimes.com/N66220017142656459133_min.json you'd get the
> abbreviated version.
>
> Does this sound reasonable to you?
>

Eminently. :)

Yours

Keith
Reply all
Reply to author
Forward
0 new messages