Some further reading

1 view
Skip to first unread message

Ben Werdmuller

unread,
Apr 29, 2008, 6:11:05 AM4/29/08
to Open Data Definition
Apologies if these are how you came to the mailing list, but I thought
I'd link to a couple of explanatory blog posts that have been posted
recently:

Introducing the Open Data Definition serves as an introduction to why
we think it's necessary:
http://blogs.zdnet.com/social/?p=477

More of a technical overview here:
http://ben.elgg.com/?action=comments&postid=46

Marcus Povey goes further and includes a breakdown and ODD examples:
http://www.marcus-povey.co.uk/2008/04/16/introducing-the-open-data-definition/

Danny Ayers

unread,
Apr 29, 2008, 7:10:53 AM4/29/08
to open-data-...@googlegroups.com
On 29/04/2008, Ben Werdmuller <ben...@gmail.com> wrote:

Introducing the Open Data Definition serves as an introduction to why
we think it's necessary:
http://blogs.zdnet.com/social/?p=477

Don't get me wrong, ODD may have a useful role to play, but this is wildly off the mark:
[[
The semantic web community has RDF, a format designed for the purpose that is potentially powerful but – as one might expect from the semantic web community – prone to ambiguity and overcomplicated implementation. In small doses, it works (FOAF is based on a subset of RDF), but for more abstract data, it becomes exponentially harder to build for. Adding new data fields requires doing contortions in XML, which makes it harder to generate dynamically. RDF parsers are also not widely supported, and it seems unlikely that most web coders would bother to read through the specification, let alone sit down and actually write compliant software.
 ]]

Most of these statements are factually incorrect.
Point by point -

"The semantic web community has RDF, a format..."
- RDF is a data model, not a format. It has a variety of serialization formats, even things like microformats can be treated as domain-specific RDF formats.

"...designed for the purpose that is potentially powerful but – as one might expect from the semantic web community – prone to ambiguity and overcomplicated implementation."

Data ambiguity is avoided by using URIs as identifiers for entities (resources) and relationships (properties). The specs couldn't be much less ambiguous. Most of the libraries allow you to work at a level that only exposes the minimum complexity (enough to get the job done).

"In small doses, it works (FOAF is based on a subset of RDF)"

- not really a subset, FOAF is a RDF vocabulary and as such allows whatever RDF you want.

"...but for more abstract data, it becomes exponentially harder to build for..."

 - because of the simple model the complexity tends to be reasonably linear with the complexity of whatever you're modeling. The lifesciences community (who have seriously complex data) have been big adopters of RDF and related technologies, at least in part because it doesn't get harder.

"Adding new data fields requires doing contortions in XML,"

- simply not true. A six year old can get their head around the concepts:
http://www.ldodds.com/blog/archives/000329.html

Practical data isn't much harder, and you don't need any XML (if you don't want). Take this example using the FOAF vocabulary (in Turtle syntax) :

@prefix :  <http://xmlns.com/foaf/0.1/> .

[  a :Person;
  :name "Danny Ayers" ]

Let's say I want to add a fairly obscure field (property). Assuming this doesn't exist already, I'll put it in a namespace of my own:

@prefix :  <http://xmlns.com/foaf/0.1/> .
@prefix feet:  <http://purl.org/stuff/feet#> .

[  a :Person;
  :name "Danny Ayers" ;
  feet:ukShoeSize "9" ]

"....which makes it harder to generate dynamically..."

The example above, using rdflib in Python would look something like:

graph.add((me, FEET['ukShoeSize'], Literal('9')))

"...RDF parsers are also not widely supported..."

RDF parsers are available for at least C, Haskell, Java, JavaScript, Common Lisp, .Net/Mono, Perl, PHP, Pike, Prolog, Python and Ruby - most of these languages have several available. Google is your friend.

"...and it seems unlikely that most web coders would bother to read through the specification, let alone sit down and actually write compliant software."

While web coders that wish to use any standards should be reasonably familiar with the relevant specs, it's pretty easy to get started by downloading a library for your favourite language, and playing with the examples.

I personally believe the aims of ODD are achievable with RDF out of the box, without any need for a new spec. But ok, you believe that ODD is necessary - fair enough. As I mentioned in my previous mail, ODD (the format) should be straightforward to map to RDF (the model), so it's not an either/or choice.

See also:
http://www.w3.org/RDF/FAQ

Cheers,
Danny.

--
http://dannyayers.com
~
http://blogs.talis.com/nodalities/this_weeks_semantic_web/

Ben Werdmuller

unread,
Apr 30, 2008, 8:19:00 AM4/30/08
to Open Data Definition
On Apr 29, 12:10 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:

> "The semantic web community has RDF, a format..."
> - RDF is a data model, not a format. It has a variety of serialization
> formats, even things like microformats can be treated as domain-specific RDF
> formats.

It is, and I apologise for my imprecise language here and elsewhere.
However, at the point of import and export, it must take the form of
an agreed format. Although RDF doesn't have to be XML, this is the
most likely format for two services to pick to talk to each other.

> A six year old can get their head around the concepts:http://www.ldodds.com/blog/archives/000329.html

Yes, a six year old can get his or her head around relationships.
Namespaces and definitions? Not so much, I'd wager.

What I'm picking at is not the RDF model itself, which is, as you say,
simple. It's the implementations and the actual real-world handling of
it.

> Let's say I want to add a fairly obscure field (property). Assuming this
> doesn't exist already, I'll put it in a namespace of my own:
>
> [..] "....which makes it harder to generate dynamically..."
>
> [..] The example above, using rdflib in Python would look something like:

... once you've added the namespace.

I think the difference here is the difference between deeper semantic
data and plain-text tags. For some purposes, the former is important;
for some uses, the latter will not just suffice, but it's impractical
to ask end users to enter the data required for anything else. In no
way are we saying that RDF is useless; what we are saying is that most
people building a new social app are unlikely to bother with it, and
for any kind of interoperability to happen, we have to make it as
painless as possible. This kind of thing is why OpenID has taken off
in a lot of places that more complicated standards like SAML couldn't
touch. Another great example is RSS 1.0 vs RSS 0.92.

> I personally believe the aims of ODD are achievable with RDF out of the box,
> without any need for a new spec. But ok, you believe that ODD is necessary -
> fair enough. As I mentioned in my previous mail, ODD (the format) should be
> straightforward to map to RDF (the model), so it's not an either/or choice.

I absolutely agree that tools and libraries should be made available
to make this usable for as many people as possible, simply.

Ben

Danny Ayers

unread,
May 1, 2008, 4:17:56 AM5/1/08
to open-data-...@googlegroups.com
On 30/04/2008, Ben Werdmuller <ben...@gmail.com> wrote:

On Apr 29, 12:10 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:

> "The semantic web community has RDF, a format..."
> - RDF is a data model, not a format. It has a variety of serialization
> formats, even things like microformats can be treated as domain-specific RDF
> formats.


It is, and I apologise for my imprecise language here and elsewhere.

Fair enough.

However, at the point of import and export, it must take the form of
an agreed format.  Although RDF doesn't have to be XML, this is the
most likely format for two services to pick to talk to each other.

True enough, but dealing with any XML syntax directly can be troublesome. Even dealing with text can be hard work...

> A six year old can get their head around the concepts:http://www.ldodds.com/blog/archives/000329.html


Yes, a six year old can get his or her head around relationships.
Namespaces and definitions? Not so much, I'd wager.

Namespaces (as used by RDF) are fairly arbitrary groupings of URIs, and URIs aren't really anything but names... Ok, definitions of any kind can be tricky.

What I'm picking at is not the RDF model itself, which is, as you say,
simple. It's the implementations and the actual real-world handling of
it.

Hmm...personally I tend to find getting the model right the tricky bit, the actually nuts & bolts being relatively straightforward, certainly no worse than say object-oriented coding or SQL.

> Let's say I want to add a fairly obscure field (property). Assuming this
> doesn't exist already, I'll put it in a namespace of my own:
>

> [..] "....which makes it harder to generate dynamically..."

>
> [..] The example above, using rdflib in Python would look something like:


... once you've added the namespace.

FEET = Namespace("http://purl.org/stuff/feet#")

done.

btw, I should probably have used PHP for these examples, to relate to Elgg. There are at least three RDF toolkits for PHP but the one that's probably most notable in this context is ARC -
http://arc.semsol.org/

a recent subproject is a plugin for WordPress -
http://bnode.org/blog/2008/01/15/rdf-tools-an-rdf-store-for-wordpress


I think the difference here is the difference between deeper semantic
data and plain-text tags. For some purposes, the former is important;
for some uses, the latter will not just suffice, but it's impractical
to ask end users to enter the data required for anything else.

But there are two separate issues sides to that - the user interface and the information representation. The former can be simple without cutting corners on the latter - e.g. a tag is potentially a lot more useful if you know who made the tagging, on which site and when. That kind of data can be captured in the local system transparently to the user (and I would think usually is). So when sharing the info across systems, why use a representation that doesn't retain such data in a globally unambiguous way? I'm not trying to suggest ODD is lossy in this way, just that plain-text tags can be deeper than they seem :-)


In no
way are we saying that RDF is useless; what we are saying is that most
people building a new social app are unlikely to bother with it, and
for any kind of interoperability to happen, we have to make it as
painless as possible.

Ok, this seems perfectly reasonable.

This kind of thing is why OpenID has taken off
in a lot of places that more complicated standards like SAML couldn't
touch. Another great example is RSS 1.0 vs RSS 0.92.

OpenID is great for the core job it does well (some of the extensions seem a bit clunky, but that's not really relevant). I'm not sure RSS is exactly a good example though, given that its dumbing-down significantly impacted what you can do with the data, and the sloppy spec led to a whole new revision a couple of years later (Atom). A fair proportion of RSS 2.0 in the wild nowadays uses namespaced extensions (e.g. iTunes) which has re-introduced syntax complexity comparable to RSS 1.0 but not the benefits of using a consistent, Web-oriented model...

> I personally believe the aims of ODD are achievable with RDF out of the box,
> without any need for a new spec. But ok, you believe that ODD is necessary -
> fair enough. As I mentioned in my previous mail, ODD (the format) should be
> straightforward to map to RDF (the model), so it's not an either/or choice.


I absolutely agree that tools and libraries should be made available
to make this usable for as many people as possible, simply.

 
Formats like ODD can act as such a tool. Assuming it has a namespace of its own (which is generally a good idea, so you can safely use terms from it in other formats) and a few specific bits and pieces in the namespace doc (pointing to a mapping to RDF), ODD can be interpreted as RDF data without any changes to the format itself.

Ben Werdmuller

unread,
May 1, 2008, 10:49:28 AM5/1/08
to Open Data Definition
On May 1, 9:17 am, "Danny Ayers" <danny.ay...@gmail.com> wrote:

> btw, I should probably have used PHP for these examples, to relate to Elgg.
> There are at least three RDF toolkits for PHP but the one that's probably
> most notable in this context is ARC -http://arc.semsol.org/

I'll check that out; the one I've been most familiar with is RAD.

> So when sharing the info across systems, why use a representation that
> doesn't retain such data in a globally unambiguous way? I'm not trying to
> suggest ODD is lossy in this way, just that plain-text tags can be deeper
> than they seem :-)

Absolutely they are, and I think you're right in saying that most
systems have that data. On the other hand, there's a contingent that
would prefer to see a more topic map like representation, which is
impractical, although if it could be captured it would almost
certainly be more useful for a bunch of applications. There's a
balance between usability and parseability, which I think we both
agree on.

> I'm not sure RSS is exactly a
> good example though, given that its dumbing-down significantly impacted what
> you can do with the data, and the sloppy spec led to a whole new revision a
> couple of years later (Atom). A fair proportion of RSS 2.0 in the wild
> nowadays uses namespaced extensions (e.g. iTunes) which has re-introduced
> syntax complexity comparable to RSS 1.0 but not the benefits of using a
> consistent, Web-oriented model...

I think that's fair; I've personally used extensions like GeoRSS. The
big question is, why did users turn their noses up at 1.0? And would
the benefits of an RDF-based model really outweigh the less consistent
model out there today for most purposes? Again, I think it's down to
the ease of implementation. RSS is at the level where you don't even
have to implement a proper XML parser; a little bit of pattern
matching does for a lot of people. Not saying that's what they
*should* be doing, but they clearly are.

> Formats like ODD can act as such a tool. Assuming it has a namespace of its
> own (which is generally a good idea, so you can safely use terms from it in
> other formats) and a few specific bits and pieces in the namespace doc
> (pointing to a mapping to RDF), ODD can be interpreted as RDF data without
> any changes to the format itself.

Which sounds like not a bad thing - allowing for the simpler schema
we've been discussing as well as elements to be used by people for
other purposes is probably the most open and flexible path to take.

Ben

Hendy Irawan

unread,
May 6, 2008, 10:09:13 AM5/6/08
to Open Data Definition, he...@rainbowpurple.com, par...@gmail.com
Hi,

In this discussion I have to agree strongly with many things Danny
said. And hopefully ODD can take significant direction from the points
he raised, especially with regard to RDF.

On May 1, 3:17 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:
> On 30/04/2008, Ben Werdmuller <benw...@gmail.com> wrote:
>
>
>
> > On Apr 29, 12:10 pm, "Danny Ayers" <danny.ay...@gmail.com> wrote:
>
> > > "The semantic web community has RDF, a format..."
> > > - RDF is a data model, not a format. It has a variety of serialization
> > > formats, even things like microformats can be treated as domain-specific
> > RDF
> > > formats.
>
> > It is, and I apologise for my imprecise language here and elsewhere.
>
> Fair enough.
>
> However, at the point of import and export, it must take the form of
>
> > an agreed format.  Although RDF doesn't have to be XML, this is the
> > most likely format for two services to pick to talk to each other.
>
> True enough, but dealing with any XML syntax directly can be troublesome.
> Even dealing with text can be hard work...

When choosing a representation format, I'd vote for N3 Turtle, and is
relatively easy to read by humans, and by machines using RDF toolkits.
RDF/XML is too verbose for humans, but OK for machines.

The problem with "plain XML" is that it doesn't specify its ontology.
Even if you specify an (God forbid) XML Schema or the nicer RELAX-NG,
it only specifies the document structure of XML -- and *not* the
semantics.

My latter point is the core sentiment here. By using RDF you're also
(re)using the semantics that we've came across all these years. Even
the ancient Dublin Core would help.

RDF isn't "future" technology. We can already use it now with as much
difficulty as plain XML. But designing a standard over RDF will be
potentially beneficial in the present and near future. (Yes, indeed
referring to Semantic Web)

> > A six year old can get their head around the concepts:
> >http://www.ldodds.com/blog/archives/000329.html
>
> > Yes, a six year old can get his or her head around relationships.
> > Namespaces and definitions? Not so much, I'd wager.
>
> Namespaces (as used by RDF) are fairly arbitrary groupings of URIs, and URIs
> aren't really anything but names... Ok, definitions of any kind can be
> tricky.
>
> What I'm picking at is not the RDF model itself, which is, as you say,
>
> > simple. It's the implementations and the actual real-world handling of
> > it.
>
> Hmm...personally I tend to find getting the model right the tricky bit, the
> actually nuts & bolts being relatively straightforward, certainly no worse
> than say object-oriented coding or SQL.

I have to agree esp. on the "namespaces" thing. It makes things
easier, rather than more difficult. Namespaces are "just names", or
perhaps loosely "aliases".

When we say "dc:creator" it is exactly the same to saying "http://
purl.org/dc/elements/1.1/creator". It's just that we alias the name
"dc" to the URL of Dublin Core Elements 1.1. We do that irrespective
of the actual format (i.e. can be mammoth RDF/XML or N3/Turtle).
There's no reason why we can't say this:

<tag>
<dc:title>web2.0</dc:title>
<dc:creator>Ben Werdmuller</dc:creator>
</tag>

It's still valid XML. And parsers/readers that don't care about
namespaces can just (dangerously) hardcode the "dc:" prefix or simply
not use it.

The above sample can also be written in N3/Turtle more concisely.

> > Let's say I want to add a fairly obscure field (property). Assuming this
> > > doesn't exist already, I'll put it in a namespace of my own:
>
> > > [..] "....which makes it harder to generate dynamically..."
>
> > > [..] The example above, using rdflib in Python would look something
> > like:
>
> > ... once you've added the namespace.
>
> FEET = Namespace("http://purl.org/stuff/feet#")
>
> done.
>
> btw, I should probably have used PHP for these examples, to relate to Elgg.
> There are at least three RDF toolkits for PHP but the one that's probably
> most notable in this context is ARC -http://arc.semsol.org/
>
> a recent subproject is a plugin for WordPress -http://bnode.org/blog/2008/01/15/rdf-tools-an-rdf-store-for-wordpress
>
> I think the difference here is the difference between deeper semantic
>
> > data and plain-text tags. For some purposes, the former is important;
> > for some uses, the latter will not just suffice, but it's impractical
> > to ask end users to enter the data required for anything else.
>
> But there are two separate issues sides to that - the user interface and the
> information representation. The former can be simple without cutting corners
> on the latter - e.g. a tag is potentially a lot more useful if you know who
> made the tagging, on which site and when. That kind of data can be captured
> in the local system transparently to the user (and I would think usually
> is). So when sharing the info across systems, why use a representation that
> doesn't retain such data in a globally unambiguous way? I'm not trying to
> suggest ODD is lossy in this way, just that plain-text tags can be deeper
> than they seem :-)
>
> In no

I have to agree again with this. Some people are trying to push
"semantic tagging", but most apps are blind to what a particular tag
means (semantically).

Regardless with tagging, I'd
Reply all
Reply to author
Forward
0 new messages