[dady] Re: RDF Dataset Notifications

10 views
Skip to first unread message

Niklas Lindström

unread,
Apr 17, 2010, 7:26:54 AM4/17/10
to Leigh Dodds, Linking Open Data, Richard Cyganiak, dataset-...@googlegroups.com
Hi Leigh!

On Fri, Apr 16, 2010 at 10:19 PM, Leigh Dodds <leigh...@talis.com> wrote:
> There's been a fair bit of discussion, and more than a few papers around
> dataset notifications recently. I've written up a blog post and a quick
> survey of technologies to start to classify the available approaches:
>
> http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/

Nice summary! This is a most important topic. I think the Dataset
Dynamics Group [1] is very relevant to this, so I've CC:ed that list.
(Note: posting to that requires membership.)

Also, the W3C eGov group may be of interest to you, where work is
currently under way to take the dcat vocabulary forwards [2]. See e.g.
[3], and my followup [4] which focuses on this aspect of datasets
(outlined in the COURT project [5]).

For the needs I've had so far, Atom seems a very viable way forward
(and pubsubhubbub is a very powerful extension to that method).
However, it would be very beneficial to the community if the different
RDF vocabularies (i.e. AtomOwl [6], those listed at [7], and OPM [8])
could be consolidated somehow. Especially for logging and RDF-based
data store implementation purposes. One thing lacking in these models
seems to be representing deletions (see e.g. [9] for an openvocab
extension to AtomOwl for those).

How/if this can be related to SIOC is another interesting question. I
think care should be taken to differentiate between the domain
described by the content and the (mechanical) way datasets, their
repositories, modifications and syndications are described. To keep
things orthogonal.

My take is basically close to a resource (or even named graph)
oriented approach. I consider (atom) entries as a package of one or
more closely related information resources sharing a common topic (say
a document, person, vocabulary or a data(sub)set). In my work I store
all RDF extractable from such an entry in a timestamped context
(corresponding to the entry itself). I think this is also close to how
many content repositories (e.g. DSpace and Fedora) are (or should be)
modelled.

An entry in this view corresponds to a resource with one or more
representations, also carrying possible attachments such as images or
appendices to the primary document. This should be aligned with
REST-principles and a resource-oriented management of datasets. (And
may be considered related to, but more simplistic than, e.g. OAI-ORE
[10] or even CMIS.)

Of course, as you clearly mention, this is suboptimal for high-volume
updates. The discussion on "DBPedia hosting burden" on this list,
especially the thoughts on using bittorrent [11] or similar is
interesting here. I still wonder whether a resource-oriented model
wouldn't be a good way to represent the underlying repository though
(and then combined with triple-centric updates).

I'm thinking that the dataset dynamics group may be the most
appropriate forum to take this further. What do you think? It would be
great to have it aligned with the dcat work as well (and/or void+dady,
depending on whether these progress together).

Best regards,
Niklas

[1]: <http://groups.google.com/group/dataset-dynamics/>
[2]: <http://vocab.deri.ie/dcat>
[3]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0021.html>
[4]: <http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0022.html>
[5]: <http://code.google.com/p/court/>
[6]: <http://bblfish.net/work/atom-owl/2006-06-06/AtomOwl.html>
[7]: <http://groups.google.com/group/dataset-dynamics/web/components-vocabularies-protocols-formats>
[8]: <http://openprovenance.org/>
[9]: <http://open.vocab.org/docs/DeletedEntry>
[10]: <http://www.openarchives.org/ore/>
[11]: <http://lists.w3.org/Archives/Public/public-lod/2010Apr/0205.html>

Nathan

unread,
Apr 18, 2010, 3:03:51 PM4/18/10
to Leigh Dodds, dataset-...@googlegroups.com
also of interest from apassant

http://apassant.net/blog/2010/04/18/sparql-pubsubhubbub-sparqlpush


Leigh Dodds wrote:
> Hi,
>
> 2010/4/17 Niklas Lindström <linds...@gmail.com>:
>> Nice summary! This is a most important topic. I think the Dataset
>> Dynamics Group [1] is very relevant to this, so I've CC:ed that list.
>> (Note: posting to that requires membership.)
>
> While I'd seen some of the output from the Vienna workshop (I was
> there!) I wasn't aware that the group existed, so apologies for that.
> I'll have a look at the group archives.
>
>> Also, the W3C eGov group may be of interest to you, where work is
>> currently under way to take the dcat vocabulary forwards [2]. See e.g.
>> [3], and my followup [4] which focuses on this aspect of datasets
>> (outlined in the COURT project [5]).
>
> Yep. I'm a member of the eGov IG and one of the reasons I included
> "Dataset Notifications" is to support some of the use cases from
> there.
>
>> For the needs I've had so far, Atom seems a very viable way forward
>> (and pubsubhubbub is a very powerful extension to that method).
>> However, it would be very beneficial to the community if the different
>> RDF vocabularies (i.e. AtomOwl [6], those listed at [7], and OPM [8])
>> could be consolidated somehow. Especially for logging and RDF-based
>> data store implementation purposes. One thing lacking in these models
>> seems to be representing deletions (see e.g. [9] for an openvocab
>> extension to AtomOwl for those).
>
> Do you mean rationalising vocabularies so that there's a clear way to
> use Atom to syndicate RDF updates, or an RDF mapping for Atom. While
> related I think these are two different goals.
>
>> How/if this can be related to SIOC is another interesting question. I
>> think care should be taken to differentiate between the domain
>> described by the content and the (mechanical) way datasets, their
>> repositories, modifications and syndications are described. To keep
>> things orthogonal.
>
> Agreed.
>
>> My take is basically close to a resource (or even named graph)
>> oriented approach. I consider (atom) entries as a package of one or
>> more closely related information resources sharing a common topic (say
>> a document, person, vocabulary or a data(sub)set). In my work I store
>> all RDF extractable from such an entry in a timestamped context
>> (corresponding to the entry itself). I think this is also close to how
>> many content repositories (e.g. DSpace and Fedora) are (or should be)
>> modelled.
>
> Yes, I think a resource oriented approach will fit well with many
> existing protocols and formats.
>
>> I'm thinking that the dataset dynamics group may be the most
>> appropriate forum to take this further. What do you think? It would be
>> great to have it aligned with the dcat work as well (and/or void+dady,
>> depending on whether these progress together).
>
> I'll look at signing up to the list. Barring incorporation of
> comments/feedback, I'm done with my first pass through this as I
> wanted to clarify for myself the different trade-offs and integration
> patterns.
>
> Cheers,
>
> L.
>



--
Subscription settings: http://groups.google.com/group/dataset-dynamics/subscribe?hl=en
Reply all
Reply to author
Forward
0 new messages