The JSON-LD context and type coercion

113 views
Skip to first unread message

Manu Sporny

unread,
Oct 16, 2010, 5:30:57 PM10/16/10
to JSON-LD
This is a chat that Mark and I had over Skype today concerning Richard's
dare, type coercion, context expression, nesting, the conceptual model
and a few other JSON-LD related things:

Mark Birbeck: I'm just responding to your email earlier...I think the
mailing-list will work quite well as a way to 'think out loud' on some
of these things.
Mark Birbeck: I like Richard's dare. :)
Manu Sporny: I've worked out how to make it happen, just need to add it
to the spec.
Manu Sporny: A section on type coercion
Manu Sporny: but I disagree that it should go in the context.
Mark Birbeck: Ok.
Manu Sporny: or maybe it could go in the context... too many things in
my head to think about it clearly at this point in time.
Mark Birbeck: No worries.
Mark Birbeck: :)
Manu Sporny: Well, the only reason it's happening now is because we
needed a solution to the problem of semantic APIs for our system.
Manu Sporny: We are consuming RDFa
Manu Sporny: and one of the first problems you come up against
Manu Sporny: is how do you make a Linked Data Web Service?
Manu Sporny: and since people love JSON for web services - it seemed to
be the most natural solution.
Manu Sporny: (but I'm also trying to balance it against having a
shipping product - so, I'm fine with taking Richard's dare - but, I
don't want it to hang us up)
Mark Birbeck: Yes. JSON is just so handy.
Manu Sporny: just trying to be very clear about our motivations.
Mark Birbeck: That's why I said to him that I think we have plenty for a
'first version', and a good foundation to build on.
Mark Birbeck: We can already do useful stuff with what we have.
Manu Sporny: From our point of view: We need something that works in the
next month - and any potential changes can't disrupt our timeline
Manu Sporny: that's not to say that others will view it the same way :)
Mark Birbeck: Well, for the current 'version' I think we're nearly there.
Manu Sporny: I hope so :)
Manu Sporny: my engineers groan every time I tell them that there might
be /something/ that changes with JSON-LD in the future. :)
Mark Birbeck: I'm happy to leave out the cleverness of type coercion for
another version.
Manu Sporny: I think we can mention it
Manu Sporny: it doesn't throw too much of a wrench in the works.
Mark Birbeck: In fact there are only a couple of things I'd want to add
to the current set-up. ;)
Manu Sporny: I just want to make sure that we get CURIE processing
figured out - whether or not it is in object.
Manu Sporny: I'm fine w/ adding __base__ as long as it doesn't
complicate things.
Mark Birbeck: The first is that I think the mappings should be nested
inside the context.
Mark Birbeck: That way we have the freedom to add other properties of a
context -- base, coercion rules, stuff we haven't thought of -- without
risking collision with a token.
Mark Birbeck: Actually, if we had that change now, I could even live
without base, etc., because I'd know that we can slip it in easily.
Manu Sporny: sure, we could do that
Manu Sporny: or we could reserve all names that start with "__"
Manu Sporny: "__vocab__", "__base__", "__coercion__" ?
* Manu Sporny was trying to avoid deep nesting.
Mark Birbeck: I'm not keen on the "___x___" approach because it doesn't
'look like' anything else we've done.
Mark Birbeck: I could live with it, of course. :)
Manu Sporny: I'm fine w/ @vocab too
Manu Sporny: or @base
Manu Sporny: @coercion
Mark Birbeck: I was thinking more of:
Mark Birbeck: {
Mark Birbeck: base: ...,
Mark Birbeck: types: ...,
Mark Birbeck: tokens: { ... }
Mark Birbeck: }
Manu Sporny: right, and the only argument I'd have against that is the
deeply nested structures.
Manu Sporny: I /could/ live with it
Manu Sporny: but I was trying to make it quick to write
Mark Birbeck: Yes, agreed.
Manu Sporny: and JSON nesting is one of the places where I see the most
errors
Mark Birbeck: One trick I used a lot in my library...but you might not
like this...
Manu Sporny: when writing the syntax... trying to find that one missing
} or , is annoying.
Mark Birbeck: Actually...wouldn't work. Forget that...
Mark Birbeck: Yes.
Mark Birbeck: Although one of the reasons I started to use RDFj more
than RDFa in my applications was that you get syntax highlighting in
editors.
Manu Sporny: I think the likelyhood that someone will use '@' to start
one of their prefixes or tokens is minor.
Mark Birbeck: Also you can do JSLint etc.
Mark Birbeck: Yes, true.
Manu Sporny: or '#base', '#types"
Mark Birbeck: Overoading '#'.
Manu Sporny: right, being able to JSLint is a good thing - syntax
highlighting is a good thing.
Mark Birbeck: Mmm...
Mark Birbeck: Too trixy to append ':' to tokens I guess?
Mark Birbeck: {
Mark Birbeck: base: ...,
Mark Birbeck: "name:" "..."
Mark Birbeck: }
Mark Birbeck: Difficult to see.
* Manu Sporny agrees.
Mark Birbeck: So...choices are (a) to prefix properties (e.g.,
underscore, '#', etc.), (b) to prefix tokens (e.g., '#' again), or (c)
to make tokens into a nested list.
Manu Sporny: Somewhat related: I was thinking that we could do type
coercion in two ways: 1) stuff the rules into the context, or 2) create
a new "%" thing and put it at the top level.
Manu Sporny: so type coercion could live outside of the context
Mark Birbeck: Would prefer to see it in context.
Mark Birbeck: Nothing to stop us combining contexts.
Manu Sporny:
{
"#": {...},
"%": {"foaf:homepage", "IRI", ...}
...
}
Manu Sporny: that way, we could use the context for the coercion rules?
Manu Sporny: (I don't really like that solution)
Mark Birbeck: You only need two properties: IRI and literal.
Mark Birbeck: Each contains a list of predicates.
Manu Sporny: three properties, maybe more? typed literal, language
literal, plain literal, IRI?
Mark Birbeck: True.
Mark Birbeck: Could also extend it to include any datatype, as it happens.
Manu Sporny: right
Mark Birbeck: I.e., these predicates are dates.
* Manu Sporny nods.
Mark Birbeck: So that bits easy...it's either a list like you say, or a
list like I say...not much between the two.
Mark Birbeck: But the next level up...
Mark Birbeck: I was thinking of "types" (or "_types" or "#types") the
property which contains the list...but I guess "%" is just as good.
Mark Birbeck: I'm just wondering if all of these characters might get
confusing.
Mark Birbeck: Do we invent another one for 'base' for example?
Manu Sporny: ok, so seems that we more-or-less agree on something like:
{
"#":
{
"dcterms": "http://purl.org/dc/terms/",
"%": {"dcterms:published": "xsd:dateTime", ...}
}
}
Manu Sporny: yeah, characters are going to get confusing.
Manu Sporny: so, another reason for the characters is for the streaming
case.
Manu Sporny: we're working with graph signatures as well - using PKI to
sign JSON-LD graphs
Manu Sporny: (we need to do this for expressing contracts, licenses and
payment information in a decentralized way)
Manu Sporny: and one of the things with JSOn-LD, at least with the way
that it is designed right now
Mark Birbeck: Right.
Manu Sporny: is that you can't start generating triples unless you have
the context and subject.
Mark Birbeck: (It's a great use-case.)
Mark Birbeck: Right...well that's an interesting issue separate from this.
Manu Sporny: and the reason I picked "@" and "#" is that if you
alphabetize your output (sort the output by key), "#" is going to come
first, followed by "@", followed by all the keys.
Mark Birbeck: So, one way of looking at the context is to see it as
something that you need to have in place before you start parsing.
Manu Sporny: and our JSON-LD graph signing algorithm states, "normalized
the JSON-LD, sort the keys alphabetically, and serialize without space
separation between the JSON syntax, then sign.
Manu Sporny: right
Mark Birbeck: Another way to look at it is that it's something that can
be 'applied' to a triple-store to 'fix it up'.
Manu Sporny: and if you look at the current processing algorithm - there
is a "list of held objects", which you need to use if the "#" doesn't
come first.
Mark Birbeck: Right.
Manu Sporny: so, back to your point about the symbols being difficult to
remember
Manu Sporny: I agree
Manu Sporny: however, we don't want to pick just anything (because of
this issue)
Mark Birbeck: But we could devise a mode where a parser simply transfers
everything it sees into the triple-store, untouched.
Manu Sporny: we want to make serializing these graphs easy.
Mark Birbeck: Right.
Manu Sporny: yes, but that information would need to be fixed up at some
point
Manu Sporny: and what happens if you already have triples in the graph?
Mark Birbeck: Yes...the moment you see the context.
Manu Sporny: I agree that it could be done that way.
Mark Birbeck: But anyway, doesn't the "#" give you everything you need?
Mark Birbeck: Provided that comes first, it doesn't matter what's inside?
Manu Sporny: yes
Mark Birbeck: So "%" doesn't gain you anything?
Manu Sporny: the only thing it gains
Manu Sporny: is that it comes after "#" in the ascii table
Manu Sporny: So, no "%" doesn't really gain us anything other than
separating type coercion from context - and the only reason we'd want to
do that is so we could make statements like this:
{
"@context": { "dcterms:created": "xsd:dateTime"},
}
Mark Birbeck: You mean add those mappings outside of a context?
Manu Sporny: sorry, no
Manu Sporny: forgot the context.
Mark Birbeck: I follow what you mean, though (I think), that you want to
add these rules without adding a context?
Manu Sporny: Sorry, I meant yes - confusing myself now, this is what I
meant:
{
"#": {...},
"@context": { "dcterms:created": "xsd:dateTime"},
}
Mark Birbeck: Right.
Manu Sporny: However, there's nothing that would prevent us from moving
it into the context
Manu Sporny: yes
Mark Birbeck: But instead of '@context' you mean '@types' or something, no?
Manu Sporny: yes
Manu Sporny: sorry :(
Mark Birbeck: np
Mark Birbeck: Two things come to mind.
Manu Sporny: The other piece of information is optimizing for the common
case (in the long term)
Manu Sporny: I think more people will want to specify prefix/token
mappings than will want to specify type coercion rules.
Mark Birbeck: The first is that the nice thing about "#" is that it
groups everything together, and says "here are the things that turn this
ordinary object into a semantic object".
* Manu Sporny nods.
Mark Birbeck: I like your change from the token "context" to "#" because
it's remeniscent of "$" in jQuery. Not as a shorthand, but in the way
that it sums up a whole 'approach'...a new way of thinking.
Manu Sporny: so, in my mind, it's okay to nest all the type coercion
rules in "#", but it's not okay to nest all the prefix/token mappings in
"#" under something like "tokens"
Mark Birbeck: So I'd be strongly in favour of putting everything we can
into the context.
Manu Sporny: I think I'm coming around to that mode of thinking.
Mark Birbeck: It's a great encapsulation.
Manu Sporny: #('foaf') -- not really, but that's what it could look like
Mark Birbeck: "#" is so succinct, and yet in time it will sum up this
whole approach.
Mark Birbeck: So, the second thing that it brings to mind is that we
should allow a context to contain a context, and the resulting context
is simply the aggregate of the whole.
=== "xsd:dateTime"
Mark Birbeck: Then you could have:
Manu Sporny: "context to contain a context"?
Mark Birbeck: {
"types": { "dcterms:created": "xsd:dateTime" }
}
Mark Birbeck: or rather:
Mark Birbeck: var types = {
"types": { "dcterms:created": "xsd:dateTime" }
};
Mark Birbeck: And then:
Mark Birbeck: var context = { ... };
Mark Birbeck: then:
= types;
= context;
Mark Birbeck: What I'm getting at is that you might have one context
that contains all of your type mappings and another context that
contains a load of tokens, but you want to apply both to some JSON objects.
Manu Sporny: I'm afraid that if we do that, people will confuse the
difference between a context and a "context for a context".
Mark Birbeck: I.e., you want a context that represents the sum of both
parts.
Mark Birbeck: Possibly.
Manu Sporny: I get what you're saying - but I think it'll confuse people
if we use that terminology
Mark Birbeck: And you might argue that that the 'context for a context'
is more useful.
Manu Sporny: how about "a context can contain type coercion rules"
Manu Sporny: or "a context can contain a base URI" or "a context can
contain a default prefix"?
Manu Sporny: and the way you specify those is via "#types", "#base" and
"#vocab" ?
Mark Birbeck: That's fine by me...I was only trying to see if there was
a way to allow you to define your coercion rules separately, but without
them having to exist at the top level of an object.
Mark Birbeck: In RDFj the default vocabulary is just "". That saves one
property. :)
Manu Sporny: "" confused our engineers :)
Mark Birbeck: Really?
Mark Birbeck: Too difficult to see?
Manu Sporny: We had to define what "" meant.
Mark Birbeck: How do you mean?
Manu Sporny: "Is it the empty string, a blank string, what does it mean
if I use that for an IRI"? -- this was talking about RDFa.
Mark Birbeck: :)
Manu Sporny: but I found that they're used to "" meaning different
things in different languages.
Mark Birbeck: Yes, that's right.
Manu Sporny: "Is that /null/ in RDFa? How do you do null in RDFa? What
do you mean it means 'this document' - it's the empty string how can it
mean anything?"
Manu Sporny: found that it's easier when we just name something
Mark Birbeck: Fair enough.
Manu Sporny: that way they can call it something when referring to it.
Mark Birbeck: I would prefer to keep tokens and directives/properties
separate, I have to say.
Manu Sporny: and I agree with you in principle.
Manu Sporny: but I'm also concerned about how easy it is to author this
stuff for the standard case.
Manu Sporny: our guys get very annoyed when they have to write extra
HTML just to accommodate the RDFa they're writing.
Mark Birbeck: And you think that's when we have a couple of tokens and
nothing else?
Manu Sporny: If we're looking long-term, I think so.
Manu Sporny: if we're looking short-term, I don't know.
Manu Sporny: so, let's say JSON-LD is successful
Manu Sporny: and people aren't using type coercion because they don't
have to
Manu Sporny: because people have bought into writing this stuff "the
right way"(TM)
Manu Sporny: what are most of those objects going to look like?
Manu Sporny: I'm asserting that they're not going to have type coercion
rules in them
Manu Sporny: but most of them are going to have prefix/token mappings.
Manu Sporny: If that's the case, how do we make sure that the
prefix/token mappings are easy to express?
Mark Birbeck: You're probably right.
Manu Sporny: One way to ease expression is to not require two levels of
nesting.
Manu Sporny: in other words, I'd be annoyed if I have two levels of
nesting in most cases when there is no need to have two levels of nesting.
Mark Birbeck: Yes, definitely agreed.
Manu Sporny:
{
"#": { "#tokens": { "name": "http://xmlns.com/foaf/0.1/name" }, },
}
Mark Birbeck: Could flip it on its head...
Manu Sporny: vs.
Manu Sporny: {
"#": { "name": "http://xmlns.com/foaf/0.1/name" },
}
Mark Birbeck: "#" contains a list of token mappings, plus a single
property which contains all of the directives/properties in one object.
Mark Birbeck: No...not too sure on that...
Mark Birbeck: It would be this, by the way:
Mark Birbeck: {
"#": { "tokens": { "name": "http://xmlns.com/foaf/0.1/name" } }
}
Mark Birbeck: Not that it changes your argument. :)
Manu Sporny: right
Mark Birbeck: But the only point of prefixing with '#' is if the tokens
are mixed in.
Manu Sporny: right
Manu Sporny: I think that if we mix in a couple of "#XYZ" directives,
that it's not a big deal because those are probably not going to be used
for prefixes (I can't see the use case)
Manu Sporny: /and/
Manu Sporny: "#types" should drop away in time.
Manu Sporny: leaving just the prefix/token mappings.
Mark Birbeck: It's not so much that...it's more a case of
conceptualising this stuff.
Mark Birbeck: I don't think 'types' will drop away, I think it will be
used as much as tokens, to be honest. But I do think that as people are
getting used to this, the first use will be tokens on their own.
Manu Sporny: Hadn't thought of this before, but "#" is somewhat similar
to compiler directives "#include", "#pragma", etc.
Mark Birbeck: But on the conceptualisation thing...
Mark Birbeck: That's why I used the word 'directive'. :)
Mark Birbeck: It made me think exactly the same thing.
Mark Birbeck: So what I mean by conceptualising is 'how do we explain this'.
Mark Birbeck: Is it a list of token mappings, but some tokens are reserved?
Manu Sporny: right, and I think I understand your position on that.
Mark Birbeck: Or is it a list of 'compiler directives' of which one type
is to define a token?
Manu Sporny: it's easier to conceptualize when they're broken into
completely different keys/names.
Manu Sporny: right
Mark Birbeck: For example, we could say that it's a list of directives.
Mark Birbeck: We have #base, #types, etc.
Mark Birbeck: We also have #token.
Mark Birbeck: { "#token name": "http://foaf..." }
Mark Birbeck: And then we say that "name" is a shorthand for "#token name".
Mark Birbeck: Something like that...
Manu Sporny: ahh
Mark Birbeck: I'm not saying that's the answer. :)
Manu Sporny: interesting.
Mark Birbeck: I'm just trying to make things consistent.
Mark Birbeck: Then they're easier to explain.
Manu Sporny: my concern is that the clean conceptual model will lead to
overly-verbose syntax in the majority of cases... but I do see how that
makes it more consistent.
Manu Sporny: right
Manu Sporny: I do like that.
Mark Birbeck: My point is that no-one would ever use the verbose form.
* Manu Sporny nods.
Mark Birbeck: It's just that you can then say 'this is a list of
directives'.
Mark Birbeck: Alternatively you say 'this is a list mappings, but all
mappings that begin with '_' are reserved'.
* Manu Sporny wonders if we should apply this to "@"
Manu Sporny: so instead of "@", we have "#subject"?
Mark Birbeck: I.e., you stick with "name" and then use "_base".
Mark Birbeck: You mean we have a verbose form for everything, and then
find suitable abbreviations?
Mark Birbeck: I.e., "@" is a shorthand for "#subject"?
Manu Sporny: perhaps... seems strange.
Mark Birbeck: Depends.
Mark Birbeck: That's usually how I approach modelling new ideas.
Manu Sporny: or, everything that is a directive starts with "@"
Mark Birbeck: I prefer "#" if we're going the directive route.
Manu Sporny: "@types", "@base", "@subject", etc.
Manu Sporny: @ by itself, means @subject.
Mark Birbeck: Yes, just realised what you meant.
Mark Birbeck: Thing is, that's outside of the context.
Manu Sporny: and we can't do that with "#" by itself, because it doesn't
mean 'subject'...
Mark Birbeck: It could if it was a string.
Manu Sporny: thinking out loud - don't know where this is headed.
Mark Birbeck: {
"#": "<url>",
"name": "Manu Sporny"
}
Manu Sporny: "#": "<profile document>" ?
Mark Birbeck: It means you'd check the JSON type of "#".
Mark Birbeck: Ah...no...I was meaning subject.
Mark Birbeck: But profile might be better. :)
Manu Sporny: right, but I was thinking we'd want to reserve that for
profile documents.
Mark Birbeck: I was toying with whether you might move the subject into
the context.
Mark Birbeck: The most basic context you could have would be a URI.
Mark Birbeck: But it doesn't really work.
Mark Birbeck: I'm coming round to the underscore.
Mark Birbeck: What if we say that all we are defining is a list of tokens.
Mark Birbeck: There's no difference between tokens...they all get
processed the same way, in that they get stored in a list, and if they
appear in a CURIE position they get mapped.
Mark Birbeck: However, the JSON-LD will use some of these values to
guide its processing.
Mark Birbeck: And by convention we'll say that tokens that such a parser
needs will begin with an underscore.
Mark Birbeck: So there's no such thing as directives, or anything like that.
Mark Birbeck: They're basically just variables.
Mark Birbeck: That feels consistent, to me.
Manu Sporny: I don't see how that applies to type coercion?
Mark Birbeck: And it also just means that "#" defines a list of variables.
Mark Birbeck: That's just another variable, isn't it?
Mark Birbeck: Just so happens the variable is an object or an array.
Manu Sporny: "_type dc:created" : "xsd:dateTime" ?
Mark Birbeck: No, you could still have:
Mark Birbeck: "_type": { dc:created" : "xsd:dateTime" }
Mark Birbeck: I'm just messing around with how we conceptualise this,
still. :)
Mark Birbeck: Trying to answer the question 'what is a context'?
Mark Birbeck: I'm suggesting that a context is simply 'a collection of
tokens or variables'.
Manu Sporny: "A context is something that influences the JSON-LD
processor's behavior"
Mark Birbeck: Well, you could say that, but everything influences the
processor. :)
Manu Sporny: "As an analogy, it is a set of glasses that the JSON-LD
processor wears to view data in different ways."
Manu Sporny: The data doesn't influence the processor - does it?
Manu Sporny: The processor interprets the data using a context.
Manu Sporny: and depending on the context, the interpretation can be
different.
Manu Sporny: and the interpretation results in a graph of information.
Mark Birbeck: If I used "name" in my JSON-LD I wouldn't see it as
influencing the processor, because I would see it as being a simple
mapping. But of course, under the hood it *must* influence the
processor. That's what I mean by 'everything does'.
Manu Sporny: I see
Manu Sporny: well, you've convinced me to move away from underscores :)
Manu Sporny: but the lead character doesn't matter as much, imho
Manu Sporny: it can be "#" or "_" or something else.
Mark Birbeck: In my arguments for underscores I've convinced you to move
away? :)
Manu Sporny: I think what we've come to is that we're putting all of
this stuff into the context.
Mark Birbeck: Yes, I think so.
Manu Sporny: no, you convinced me to move away from "__vocab__"
Manu Sporny: and things of that nature
Mark Birbeck: Ah...great.
Manu Sporny: and I like "#" as a "processor directive"
Manu Sporny: and I like that everything is in the context.
Manu Sporny: and I think I like "#term name" (or whatever the example was)
Manu Sporny: so we can just use "name" and the leading "#term " is assumed.
Mark Birbeck: I think we might be close though. Seems to me that we've
got either (a) a processor directive model, with prefixes having a
special shorthand, or (b) 'everything is a variable' model, and any
variable that the processor uses to control processing will begin with
an underscore.
Mark Birbeck: Yes.
* Manu Sporny nods.
Manu Sporny: perhaps we can leave it here and have a think about it over
the next couple of days.
Mark Birbeck: I can go either way. :)
Mark Birbeck: Yes.
Manu Sporny: me too
Manu Sporny: (at least, right now)
Mark Birbeck: I think the main thing we've concluded is that we're going
to pile everything into the context. I.e., all future improvements and
features will almost certainly amount to extensions to the context.
Manu Sporny: right
Mark Birbeck: That's a good place to be, because it means we are close
to a very solid foundation.
Mark Birbeck: (I'm warming to "#" as being easier to read, and even
thinking that processor developers could use "##" for their own
experimental features, which is easier to read than "__".)
Manu Sporny: interesting, I like "##"
Manu Sporny: for experimental features
Mark Birbeck: Mmm...or even just variables you want passing in to the
processor.
Mark Birbeck: Just realised that in my work I ended up doing a lot of
callback routines.
Mark Birbeck: The context should allow me to set variables that are
available to my code.
Mark Birbeck: Anyway...good work.
Mark Birbeck: We're still moving. :)
* Manu Sporny nods.
Manu Sporny: yes, thanks for the chat - it helps a great deal.
Mark Birbeck: Thank-you, as well...much appreciated.

-- manu

--
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Saving Journalism - The PaySwarm Developer API
http://digitalbazaar.com/2010/09/12/payswarm-api/

Richard Cyganiak

unread,
Oct 18, 2010, 1:38:43 PM10/18/10
to jso...@googlegroups.com
Interesting discussion!

I agree that everything should go into the context.

Have you considered something like this?

"#": {
"foaf":"http://xmlns.com/foaf/0.1/",
"name":"http://xmlns.com/foaf/0.1/name",
"homepage":{"@":"http://xmlns.com/foaf/0.1/homepage","type":"url"}
"birthday":{"@":"http://example.com/ns#birthday","type":"xsd:date"}
}

The ideas here are:

a) the context is a collection of terms
b) a term can have multiple properties, including a URI and a type and
perhaps lots of others
c) the URI of a term is specified with "@", in analogy to outside of
the context
d) as a shorthand, instead of "foo":{"@":"http://foo/"} you can write
"foo":"http://foo/"

This assumes that coercions are applied to *terms* rather than *URIs*.
That makes more sense to me, as it allows multiple terms that map to
the same URI but with different coercions (or whatever other rules you
come up with).

The "xsd:date" in the last line should probably be a full URI.

Best,
Richard

Reply all
Reply to author
Forward
0 new messages