This is an issue that we think is at the heart of why RDF has not caught
on as a general data model - the data is very difficult to work with in
programming languages. There is no native data structure that is easy to
work with without a complex set of APIs.
When a JavaScript author gets JSON-LD from a remote source, the graph
that the JSON-LD expresses can take a number of different but valid
forms. That is, the information expressed by the graph can be identical,
but each graph can be structured differently.
Think of these two statements:
The Q library contains book X.
Book X is contained in the Q library.
The information that is expressed in both sentences is exactly the same,
but the structure of each sentence is different. Structure is very
important when programming. When you write code, you expect the
structure of your data to not change.
However, when we program using graphs, the structure is almost always
unknown, so a mechanism to impose a structure is required in order to
help the programmer be more productive.
The way the graph is represented is entirely dependent on the algorithm
used to normalize and the algorithm used to break cycles in the graph.
Consider the following example, which is a graph with three top-level
objects - a library, a book and a chapter. Each of the items is related
to one another, thus the graph can be expressed in JSON-LD in a number
of different ways:
{
"#":
{
"dc": "http://purl.org/dc/elements/1.1/",
"ex": "http://example.org/vocab#"
},
"@":
[
{
"@": "http://example.org/test#library",
"a": "ex:Library",
"ex:contains": "<http://example.org/test#book>"
},
{
"@": "<http://example.org/test#book>",
"a": "ex:Book",
"dc:contributor": "Writer",
"dc:title": "My Book",
"ex:contains": "<http://example.org/test#chapter>"
},
{
"@": "http://example.org/test#chapter",
"a": "ex:Chapter",
"dc:description": "Fun",
"dc:title": "Chapter One"
}
]
}
The JSON-LD graph above could also be represented like so:
{
"#":
{
"dc": "http://purl.org/dc/elements/1.1/",
"ex": "http://example.org/vocab#"
},
"@": "http://example.org/test#library",
"a": "ex:Library",
"ex:contains":
{
"@": "<http://example.org/test#book>",
"a": "ex:Book",
"dc:contributor": "Writer",
"dc:title": "My Book",
"ex:contains":
{
"@": "http://example.org/test#chapter",
"a": "ex:Chapter",
"dc:description": "Fun",
"dc:title": "Chapter One"
}
}
}
Both of the examples above express the exact same information, but the
graph structure is very different. If a developer can receive both of
the objects from a remote source, how do they ensure that they only have
to write one code path to deal with both examples?
That is, how can a developer reliably write the following code:
// print all of the books and their corresponding chapters
var library = jsonld.toObject(jsonLdText);
for(var bookIndex = 0; bookIndex < library["ex:contains"].length;
bookIndex++)
{
var book = library["ex:contains"][bookIndex];
var bookTitle = book["dc:title"];
for(var chapterIndex = 0; chapterIndex < book["ex:contains"].length;
chapterIndex++)
{
var chapter = book["ex:contains"][chapterIndex];
var chapterTitle = chapter["dc:title"];
console.log("Book: " + bookTitle + " Chapter: " + chapterTitle);
}
}
The answer boils down to ensuring that the data structure that is built
for the developer from the JSON-LD is framed in a way that makes
property access predictable. That is, the developer provides a structure
that MUST be filled out by the JSON-LD API. The working title for this
mechanism is called "Cycle Breaking and Object Framing" since both
mechanisms must be operable in order to solve this problem.
The developer would specify a Frame for their language-native object
like the following:
{
"#": {"ex": "http://example.org/vocab#"},
"a": "ex:Library",
"ex:contains":
{
"a": "ex:Book",
"ex:contains":
{
"a": "ex:Chapter"
}
}
}
The object frame above asserts that the developer expects to get a
library containing one or more books containing one or more chapters
returned to them. This ensures that the data is structured in a way that
is predictable and only one code path is necessary to work with graphs
that can take multiple forms. The API call that they would use would
look something like this:
var library = jsonld.toObject(jsonLdText, objectFrame);
The mechanism in the API and the algorithm that is used to perform cycle
breaking and object framing should be formalized in the JSON-LD
specification.
-- manu
--
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Linked Data in JSON
http://digitalbazaar.com/2010/10/30/json-ld/
Hi Manu.
(Sorry for poor formatting and typos, i'm using my phone)
The regularity problem you describe is a very real one and its why I was so adamant that RDF/JSON should not contain prefixes nor any nesting of resources. It means that a developer can learn one pattern for programming against the data and it always works.
So, I encourage you to pursue this more in JSON-LD
cheers,
Ian
MQL, the JSON query language used in Freebase also uses a frame based approach:
http://wiki.freebase.com/wiki/MQL
Developers seem happy to work with this, and I've often wondered
whether a version, generalised to support RDF, would be an interesting
avenue to explore. I'll be interested to see how this develops in
JSON-LD.
Cheers,
L.
--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com
interesting idea and sounds like the old problem of required and
provided interfaces of software components. If I understand right, a
developer specifies a required JSON-LD structure by using such a Frame
(a template). Now you may have different providers for data. Do the
providers also define a Frame to specify how the data is delivered?
How would a mapping from provided to required look like? Is it always
possible to re-format a JSON-LD structure automatically according to a
given Frame? If the conversion is not done by the provider, I think
you will need required and provided Frames for an automatic
conversion.
Great idea but there may be some problems to solve.
Best,
- Fabian
2011/1/22 Manu Sporny <msp...@digitalbazaar.com>:
--
Fabian
I suppose that they could but I don't think that it would be necessary.
It might be a requirement if digitally signing/verifying JSON-LD
structures is involved. Perhaps, in that case, "provider Frames" may be
specified out of band for specific objects that are to be signed. I
think, in the general case, the consumer of the data should be able to
frame the data however they want to.
> How would a mapping from provided to required look like? Is it always
> possible to re-format a JSON-LD structure automatically according to a
> given Frame? If the conversion is not done by the provider, I think
> you will need required and provided Frames for an automatic
> conversion.
Given a Frame and a list of triples, you should be able to generate an
appropriate JSON-LD structure to work with. Since you can generate a
list of triples from an existing JSON-LD structure, you should be able
to then convert that structure into another one using a Frame. There may
be more optimized mappings/algorithms for converting from one structure
into another, but clearly there is a path forward.
I do think Frames can solve the general problem, however, there are some
detail issues that need to be resolved. For instance, if there is data
in an incoming graph that isn't mentioned in your Frame, where does it
go in the structure? Often a developer may simply not care because they
won't be working with that data or don't understand it. Maybe the data
should be omitted here. Alternatively, there are cases where it's
important even if you aren't directly using the data (eg: digitally
signing/verifying JSON-LD structures).
Another question would be how do you specify that you want certain
properties/predicates to always use JSON arrays, even if there is only
one object value? I can see this being a desirable feature. A hopefully
simpler question: what result do you get from a "framing" that produces
an empty graph (ie: there were no Frame matches in the graph)?
--
Dave Longley
CTO
Digital Bazaar, Inc.
Phone: 540-961-4469
On 01/23/2011 02:43 PM, Ian Davis wrote:
> The regularity problem you describe is a very real one and its why I was
> so adamant that RDF/JSON should not contain prefixes nor any nesting of
> resources. It means that a developer can learn one pattern for
> programming against the data and it always works.
Yes, but as we discovered - not all data fits this paradigm. One of the
core goals with JSON-LD was to not require developers to change the way
that they use JSON in most of the cases.
RDF/JSON requires developers to change their JSON data in every case.
Rather than annotating the structure that they already have, RDF/JSON
requires them to change that structure entirely - and we believe that
approach will harm adoption for JSON-LD.
I don't mean to come off as being anti RDF/JSON - I think it it works
for applications that are written from scratch by RDF folks. It's fine
for people that think in terms of RDF. However, that is not most of the
world and when we tried to convince our non-RDF developers that this was
the right direction to go, it didn't take long for an angry mob to
gather - complete with pitchforks and torches. :)
Telling developers that they must place everything in a shallow
structure is placing a constraint on them that the larger developer
community will surely reject. Here are some comments we received when
discussing the adoption of shallow structures in JSON-LD:
"I can do so much more with JSON structures, why are you limiting me to
fit in this tiny RDF box?"
"You expect me to use full URLs everywhere? What if I mistype something?"
"Why can't I nest things? I can nest things in almost every static and
dynamic programming language. Why can't I just use an associative array,
why does JSON-LD not let me use the full power of that concept?"
"Why do I need to constrain my input to fit RDF's broken data model?"
(this statement was confusing the model with the data structure, but you
get the gist - developers have a hard time separating the two because
often model and structure are joined at the hip).
... I think you get the gist.
It's not that we didn't try this - we gave it an honest go, and it fell
apart because our developers were so used to working with associative
arrays for most of our programming that adding complexity to how they
think and develop was not an option for our technical team. I would be
surprised if others that are used to programming with JSON didn't have
the same set of complaints.
Now, I certainly admit that RDF/JSON doesn't have this dynamic layout
problem because its data layout is static... and that's something that
we've had to resort to a simple API to solve in JSON-LD. The merits of
that approach are certainly debatable based on the application you're
developing and your developer community.
> So, I encourage you to pursue this more in JSON-LD
We did pursue it and it didn't work out for us. However, we may be
mis-communicating, so please do point out if you see any place where we
might be missing each other's point.
Neat, I wasn't aware of that work - thanks for the link.
I think we're working with three separate concepts here that are very
nuanced in their differences.
1. Object Framing
2. Query by Example
3. Pure Query Languages
From the looks of it, MQL is more of a query language. I haven't quite
worked through all of the nuances, so this explanation may be a bit
muddy, apologies in advance.
I think that when we talk about getting data out of databases, we talk
about two distinct things. The first is how to perform the query, the
second is how we expect the result to be structured. Taking a simple SQL
query as an example:
SELECT name, age, homepage FROM users WHERE age > 18;
"name, age, and homepage" expresses the structure of the data that
should be returned.
"FROM users WHERE age > 18" expresses the query that should be performed.
Object Framing is meant to purely express the structure of the data that
should be returned (at least, that's the current design).
Pure Query Languages are meant to only express the query and not the
structure of the data that is returned.
Query By Example is a hybrid of Object Framing and Pure Query Languages
where the structure that is given is both the query and the structure in
one.
To explain further, here are a number of examples of the nuances between
each:
Object Framing Example
----------------------
{
"#": {"ex": "http://example.org/vocab#"},
"a": "ex:Library",
"ex:contains":
{
"a": "ex:Book",
"ex:contains":
{
"a": "ex:Chapter"
}
}
}
The entire frame doesn't have to match for an object to be framed.
Perhaps you make a query that only contains a library with no books.
Perhaps you get back a library with a bunch of books, but no chapters
for each book.
Keep in mind that Object Framing isn't just useful for queries, it's
also useful for laying out graphs that have an arbitrary layout into a
known layout that is asserted by the developer. That is, it is a way of
forcing a deterministic graph layout on a graph that has an unknown layout.
Given the frame above, the following would be a valid return object
matching the Object Frame:
{
"#": {"ex": "http://example.org/vocab#"},
"a": "ex:Library",
"ex:contains":
{
"a": "ex:Book",
"dc:title": "Tales of Greenville"
}
}
Note that even though there was no chapter, the result is still
considered valid. This is true for Object Framing, but isn't true for
Query by Example.
Query By Example ... err, Example
---------------------------------
For Query by Example, we assert that the object that is given as the
query MUST be completely matched in order for the query to return
successfully.
{
"#": {"ex": "http://example.org/vocab#"},
"a": "ex:Library",
"ex:contains":
{
"a": "ex:Book",
"ex:contains":
{
"a": "ex:Chapter"
"dc:title": "A Hard to Find Chapter"
}
}
}
The query above is looking for all libraries that contain books with a
chapter titled "A Hard to Find Chapter". The assumption made above is
that the subjects will be filled out in the returned object (along with
possibly all data known for each object). However, at a minimum, all
properties expressed in the query object MUST exist in order for a match
to be found.
That is, whereas Object Framing states that the structural elements at
the leaves of the tree are optional, Query By Example states that the
entire structure and all properties are required.
Pure Query Languages
--------------------
Pure query languages (as defined here) don't have any structural
information associated with them. They can, however, be paired with
Object Framing to get a desired result. For example (in this fake query
language that looks like SPARQL), we specify that a frame will be
provided for the given query:
q = "GET ?frame WHERE { dc:title == 'A Hard to Find Chapter' }"
f = {
"#": {"ex": "http://example.org/vocab#"},
"a": "ex:Library",
"ex:contains":
{
"a": "ex:Book",
"ex:contains":
{
"a": "ex:Chapter"
}
}
};
libraries = graphStore.query(q, f);
So, that's my thinking of our current approach. Dave will have to chime
in where his thinking differs or diverges, as we've played around with
mixing each of these in different ways to get a different set of
benefits/drawbacks.
> Developers seem happy to work with this, and I've often wondered
> whether a version, generalised to support RDF, would be an
> interesting avenue to explore. I'll be interested to see how this
> develops in JSON-LD.
Good to hear... we're going to be proposing this work in the RDF Working
Group at the W3C for the JSON/RDF work, so we'll get broader feedback as
time goes on.
>> So, I encourage you to pursue this more in JSON-LD
>
> We did pursue it and it didn't work out for us. However, we may be
> mis-communicating, so please do point out if you see any place where we
> might be missing each other's point.
I think probably we are. My email wasn't supposed to promote RDF/JSON
over JSON-LD, just indicate that similar thinking went in and I think
it's valuable to have regularity for developers to program against. I
think the two formats are tackling different problems.
I should have been clearer and said: So, I encourage you to pursue
_regularity and predictability of structure_ more in JSON-LD
Sorry for any confusion.
>
> -- manu
>
Ian
Ah, then we were definitely mis-communicating. I do think the two
formats are tackling different problems, so that's good that there is
agreement there.
> I should have been clearer and said: So, I encourage you to pursue
> _regularity and predictability of structure_ more in JSON-LD
Ah, yes - that's what we're attempting to do. We've implemented this
cycle-breaking/object-framing stuff internally and it seems to work
fairly well so far... we'll see if it holds up in a complete,
functioning system. Don't see any reason why it wouldn't, but there may
be a use case that we haven't covered.
> Sorry for any confusion.
Not at all - thanks for taking the time to give us your input. :)
-- manu
--
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Towards Universal Web Commerce
http://digitalbazaar.com/2011/01/31/web-commerce/