[dady] Call for review: Dataset Dynamics Guide

21 views
Skip to first unread message

Michael Hausenblas

unread,
May 13, 2010, 8:45:11 AM5/13/10
to dady

All,

From our last meeting (wow, that was in March - I guess we should have one
in May, again, also to reflect the WWW10 outcome) I had an action to update
the "Change Notification Protocol" [1] in the group's Wiki.

Now, I thought it makes more sense to produce a self-contained document that
proposes a concrete combination of technology to elicit feedback and
eventually kick-off implementations.

Hence, please find at [2] the "Dataset Dynamics Guide", my first attempt to
get all these things together. For sure this is an early draft and I don't
expect that the group as such now says "great Michael, thanks, we gonna
publish this as the final advice", however, I understand it as a seed, a
concrete proposal from where we can start the discussion, and, whilst we're
implementing it, extending and adapting the guide.

So, to make a long story short, please review the "Dataset Dynamics Guide"
[2] and send in your comments to this list. I'll do my best to incorporate
all the requests, and, in cases where there are obvious disagreements, try
to resolve it here (or in a telecon - see begining of the mail ;)

Cheers,
Michael


[1] http://groups.google.com/group/dataset-dynamics/web/teleconferences
[2] http://lab.linkeddata.deri.ie/dady-guide/

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html


Leigh Dodds

unread,
May 13, 2010, 8:57:03 AM5/13/10
to dataset-...@googlegroups.com
Hi,

Before defining a new mechanism, has the group evaluated and rejected
all of the other options?

I did a run down of different types of notification [1], and it
strikes me that PubSubhubbub and/or SDShare covers a lot of the
necessary ground already. The underlying tech supports both push and
pull based systems so seem well adapted to a number of use cases.

With respect to the dady vocabulary, the issue I see here is that I
think its hard to quantify, statically, what the frequency of updates
to a dataset might be. Clearly the publisher may have an idea, but
this may differ from reality, particularly if a dataset is managed by
a community. The frequency classifications also seem pretty coarse
grained, I wonder whether a series of metrics, e.g. updates in last
hour, 24 hours, month, year would be more useful?

Cheers,

L.

[1]. http://spreadsheets.google.com/pub?key=tLWdskoM-2--vLjUI05e7qQ&output=html
> Please consider the environment before printing this email.
>
> Find out more about Talis at http://www.talis.com/
> shared innovationâ„¢
>
> Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.
>
> Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
>



--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com

Nathan

unread,
May 13, 2010, 9:05:06 AM5/13/10
to dataset-...@googlegroups.com
Hi,

Similarly I have the same thoughts about PubSubHubbub.

Possibly the most (hopefully) constructive comment I can add at this
time, is that perhaps the spec could be split in two.

1: a generic combination of Atom+Changesets (and possibly tying in with
pubsubhubbub and sparqlPush [1]

2: methods of linking to notifications / the atom stream (for dady and
other vocabs / even via Link header or rel's?)

[1] http://apassant.net/blog/2010/04/18/sparql-pubsubhubbub-sparqlpush

Best,

Nathan
>> shared innovation�

Kingsley Idehen

unread,
May 13, 2010, 9:48:21 AM5/13/10
to dataset-...@googlegroups.com
+1 for fine grained metrics (to fraction of a second).

Remember, change sensitivity is a very serious "quality of data" factor.


Kingsley
>> Cheers,
>> Michael
>>
>>
>> [1] http://groups.google.com/group/dataset-dynamics/web/teleconferences
>> [2] http://lab.linkeddata.deri.ie/dady-guide/
>>
>> --
>> Dr. Michael Hausenblas
>> LiDRC - Linked Data Research Centre
>> DERI - Digital Enterprise Research Institute
>> NUIG - National University of Ireland, Galway
>> Ireland, Europe
>> Tel. +353 91 495730
>> http://linkeddata.deri.ie/
>> http://sw-app.org/about.html
>>
>>
>>
>> Please consider the environment before printing this email.
>>
>> Find out more about Talis at http://www.talis.com/
>> shared innovation�
>>
>> Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.
>>
>> Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
>>
>>
>
>
>
>


--

Regards,

Kingsley Idehen
President & CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen





Michael Hausenblas

unread,
May 13, 2010, 10:33:29 AM5/13/10
to dady

Leigh,

Thanks a lot for your comments, very much appreciated!

> Before defining a new mechanism, has the group evaluated and rejected
> all of the other options?

> I did a run down of different types of notification [1], and it
> strikes me that PubSubhubbub and/or SDShare covers a lot of the
> necessary ground already. The underlying tech supports both push and
> pull based systems so seem well adapted to a number of use cases.

I guess there are two answers to it (slightly orthogonal):

1. The group discussed and continues to discuss options. I don't think that
we as a whole have yet rejected any option
2. That said, there is nothing new in the mechanism. It just makes clear how
all of this should play together. In my opinion ;)

The main goal, and let me be very clear on this, was exactly to not invent
anything new. Just use existing technologies and describe a good practice,
if you wish. Pretty much the same as with the How to publish Linked Data
guide - just clarify what are the DOs and DONTs.

What is most important to note that with Atom at its core, the memo opens
the possibility to use PubSubhubbub , which would not have been possible
with RSS 1.0 (though more RDFish ;)

Makes sense?

> With respect to the dady vocabulary, the issue I see here is that I
> think its hard to quantify, statically, what the frequency of updates
> to a dataset might be. Clearly the publisher may have an idea, but
> this may differ from reality, particularly if a dataset is managed by
> a community. The frequency classifications also seem pretty coarse
> grained, I wonder whether a series of metrics, e.g. updates in last
> hour, 24 hours, month, year would be more useful?

Totally agree. Rough cut and yet to be further discussed (not only the
frequency part but the entire modelling of dady). Partly the current design
has been influenced by the Dublin Core Collection Description Frequency
Vocabulary [1] but I'm more than happy to learn what other options we have.

Any advises on this one?

Cheers,
Michael

[1] http://dublincore.org/groups/collections/frequency/

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



Kingsley Idehen

unread,
May 13, 2010, 10:50:31 AM5/13/10
to dataset-...@googlegroups.com
Michael,

Might be worth looking up DDE (Dynamic Data Exchange) and the NetDDE
variant.

pubsubhubub makes an HTTP driven implementation of DDE possible.

Links:

1. http://en.wikipedia.org/wiki/Dynamic_Data_Exchange

Michael Hausenblas

unread,
May 13, 2010, 11:20:23 AM5/13/10
to dady

Nathan,

Same here, thanks a lot for you comments!

>
> Possibly the most (hopefully) constructive comment I can add at this
> time, is that perhaps the spec could be split in two.

Hmmm. It's a rather short document anyway. I try to understand what would be
the advantage of splitting it?

As Alexandre is a colleague of mine I've discussed sparqlPush with him
already. I guess we're on the same page here. But yes, I agree, it should be
compatible ...

> 2: methods of linking to notifications / the atom stream (for dady and
> other vocabs / even via Link header or rel's?)

I think you have a very valid point here. Indeed, I'll draft a section re
Link: header in anticipation that this will be a standard, soon. Hopefully
;)

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Nathan <nat...@webr3.org>
> Organization: webr3
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Thu, 13 May 2010 14:05:06 +0100
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Michael Hausenblas

unread,
May 13, 2010, 11:23:08 AM5/13/10
to dady
Kingsley,

> Might be worth looking up DDE (Dynamic Data Exchange) and the NetDDE
> variant.
>
> pubsubhubub makes an HTTP driven implementation of DDE possible.

Again, thanks a lot - very interesting.

I remember coding some DDE stuff in the early 90ies but I somehow fail to
see how that could help us re the "frequency of updates" issue, or did I
misunderstand your comment?

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Kingsley Idehen <kid...@openlinksw.com>
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Thu, 13 May 2010 10:50:31 -0400
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Nathan

unread,
May 13, 2010, 11:30:32 AM5/13/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Nathan,
>
> Same here, thanks a lot for you comments!
>
>> Possibly the most (hopefully) constructive comment I can add at this
>> time, is that perhaps the spec could be split in two.
>
> Hmmm. It's a rather short document anyway. I try to understand what would be
> the advantage of splitting it?

probably none, just a mental split needed by me :)

> As Alexandre is a colleague of mine I've discussed sparqlPush with him
> already. I guess we're on the same page here. But yes, I agree, it should be
> compatible ...

concur and looks that way!

>> 2: methods of linking to notifications / the atom stream (for dady and
>> other vocabs / even via Link header or rel's?)
>
> I think you have a very valid point here. Indeed, I'll draft a section re
> Link: header in anticipation that this will be a standard, soon. Hopefully
> ;)

i can't figure out if it is or not..

https://datatracker.ietf.org/doc/draft-nottingham-http-link-header/ (see
history tab)

"State Changes to RFC Ed Queue from Approved-announcement sent by
Cindy Morgan"


aside: atomtriples, atomowl, foreign markup, RDFa-tom even? - perhaps
you can guess Mike Amundsen and I are also talking about this!

Best,

Nathan

Kingsley Idehen

unread,
May 13, 2010, 11:50:03 AM5/13/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Kingsley,
>
>
>> Might be worth looking up DDE (Dynamic Data Exchange) and the NetDDE
>> variant.
>>
>> pubsubhubub makes an HTTP driven implementation of DDE possible.
>>
>
> Again, thanks a lot - very interesting.
>
> I remember coding some DDE stuff in the early 90ies but I somehow fail to
> see how that could help us re the "frequency of updates" issue, or did I
> misunderstand your comment?
>
> Cheers,
> Michael
>
>
Micheal,

My response really wasn't about frequency of updates, I believe the +1
re. Leigh's comments where clear enough in that regard.

As with most of my communications re. contemporary Web technology,
simply look to the past, holistically. Maybe I should have created a new
subject (for topical clarity).

The subject matter realm of Data Set Dynamics (new moniker) is a sub set
of what's already been achieved (in proprietary fashion) via Dynamic
Data Exchange :-)

Michael Hausenblas

unread,
May 13, 2010, 11:51:27 AM5/13/10
to dady

Kingsley,

> My response really wasn't about frequency of updates, I believe the +1
> re. Leigh's comments where clear enough in that regard.
>
> As with most of my communications re. contemporary Web technology,
> simply look to the past, holistically. Maybe I should have created a new
> subject (for topical clarity).
>
> The subject matter realm of Data Set Dynamics (new moniker) is a sub set
> of what's already been achieved (in proprietary fashion) via Dynamic
> Data Exchange :-)

Got you now. Thanks for the clarification. I agree.

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Kingsley Idehen <kid...@openlinksw.com>
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Thu, 13 May 2010 11:50:03 -0400
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Kingsley Idehen

unread,
May 13, 2010, 11:58:39 AM5/13/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Kingsley,
>
>
>> My response really wasn't about frequency of updates, I believe the +1
>> re. Leigh's comments where clear enough in that regard.
>>
>> As with most of my communications re. contemporary Web technology,
>> simply look to the past, holistically. Maybe I should have created a new
>> subject (for topical clarity).
>>
>> The subject matter realm of Data Set Dynamics (new moniker) is a sub set
>> of what's already been achieved (in proprietary fashion) via Dynamic
>> Data Exchange :-)
>>
>
> Got you now. Thanks for the clarification. I agree.
>
> Cheers,
> Michael
>
>
Michael,

So as implementation start to appear, and story telling collateral
starts to be assembled, lets build a bridge from DDE showing its new and
open HTTP reincarnation etc.. This approach will garner attention
immediately, and HTTP ubiquity will enable simple (but powerful) demos
from a plethora of tools (Web and Native).

Michael Hausenblas

unread,
May 13, 2010, 12:06:26 PM5/13/10
to dady
Kingsley,

> So as implementation start to appear, and story telling collateral
> starts to be assembled, lets build a bridge from DDE showing its new and
> open HTTP reincarnation etc.. This approach will garner attention
> immediately, and HTTP ubiquity will enable simple (but powerful) demos
> from a plethora of tools (Web and Native).

Sounds like a plan, though unsure how well received DDE is/was in the
developer community. I for myself was pretty happy to get out of the
DDE/OLE/OLE2 circles back then, but I might be totally wrong.

As an aside: IIRC you said some time ago that OpenLink was rolling out a
diff engine and was sort of hoping that you would reveal some of its
details, now, but maybe this is still to early?

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Kingsley Idehen <kid...@openlinksw.com>
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Thu, 13 May 2010 11:58:39 -0400
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Kingsley Idehen

unread,
May 13, 2010, 12:53:39 PM5/13/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Kingsley,
>
>
>> So as implementation start to appear, and story telling collateral
>> starts to be assembled, lets build a bridge from DDE showing its new and
>> open HTTP reincarnation etc.. This approach will garner attention
>> immediately, and HTTP ubiquity will enable simple (but powerful) demos
>> from a plethora of tools (Web and Native).
>>
>
> Sounds like a plan, though unsure how well received DDE is/was in the
> developer community.

Don't think about Developer Community. Think more about the User of
Technology communities. They are the ones that will be blown away by
modern DDE via HTTP.
> I for myself was pretty happy to get out of the
> DDE/OLE/OLE2 circles back then, but I might be totally wrong.
>

You were confined to proprietary operating systems at the time.

I am sure people using Objective-C based distributed objects felt the
same when confined to those black NeXT machines.

We are doing all the NeXTStep OS stuff and DDE, OLE, OLE2, CAIRO stuff
via HTTP today, but without platform specificity .
> As an aside: IIRC you said some time ago that OpenLink was rolling out a
> diff engine and was sort of hoping that you would reveal some of its
> details, now, but maybe this is still to early?
>

It's nearly done, as you know I like simply unveil with a live practical
demo. For example using DBpedia or LOD Cloud Cache :-)

Nearly there re. pubsubhubub implementation that rides our new RDF delta
engine.
> Cheers,
> Michael

Bernhard Haslhofer

unread,
May 14, 2010, 6:47:20 AM5/14/10
to dataset-...@googlegroups.com
Great Michael, thanks ... for putting together the memo :-) Here are my comments:

* Granularity of change notifications:

- One outcome of the discussions in Raleigh was that the notification of triple-level changes introduces quite some complexity for data providers because they need to maintain the history of triple level-changes until a certain point in the past. Keeping track of resource-level (entity-level) changes is less complex and has successfully been implemented by other protocols that deal with changes in data sets, such as the OAI-PMH protocol [1]. So if we want to keep the protocol simple and stupid we should take this into account. I know we already discussed that several times...but I want to raise that point again.

- In the current version of the guide you refer to HTTP caching and memento for resource-level changes. To be honest, I don't get the point here. Can you explain?

- Thanks for the evset:ResourceChangeEvent hint. We will reconsider/fix this.

* Discovery mechanism:

- for the update frequencies we should quantifiable values, i.e. numbers;

- for the client it is important to know from which datestamp on a data provider records data set changes. See the OAI-PMH earliestDatestamp element (http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify)

- a dady:lastUpdate (as subproperty of http://purl.org/dc/terms/modified) is for the client the most valuable and for the data provider the easiest to implement meta-information it can offer

* Notification of changes:

- should we also consider the option to deliver the changesets embedded into the the atom feed? That would save lot's of HTTP requests...

- any idea how we can get rid of SOME TITLE and SOME SUMMARY? We do not really need that info, right?


That's it so far. Cheers,

Bernhard



[1] OAI-PMH: http://www.openarchives.org/OAI/openarchivesprotocol.html
______________________________________________________
Research Group Multimedia Information Systems
Department of Distributed and Multimedia Systems
Faculty of Computer Science
University of Vienna

Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
E-Mail: bernhard....@univie.ac.at
WWW: http://www.cs.univie.ac.at/bernhard.haslhofer

Nathan

unread,
May 14, 2010, 6:59:48 AM5/14/10
to dataset-...@googlegroups.com
Bernhard Haslhofer wrote:
> Great Michael, thanks ... for putting together the memo :-) Here are my comments:
>
> * Granularity of change notifications:
>
> - One outcome of the discussions in Raleigh was that the notification of triple-level changes introduces quite some complexity for data providers because they need to maintain the history of triple level-changes until a certain point in the past. Keeping track of resource-level (entity-level) changes is less complex and has successfully been implemented by other protocols that deal with changes in data sets, such as the OAI-PMH protocol [1]. So if we want to keep the protocol simple and stupid we should take this into account. I know we already discussed that several times...but I want to raise that point again.

This is quite an important issue, 3 different paths to this are:

1- Delta-V engine which Kingsley has mentioned

2- Actually updating resources via changset(s), then you have the
changes in the format needed.

3- I think I can opensource something for this :)
With regards the above, I made an effort to cover RDF diff and patch a
few months ago with GUO[1] - I've since come to realise (primarily
thanks to Henry Story) that the Ontology and way it works can't be used
and create invalid statements in the updates (even though the diff's are
correct, and the graph produced after patching is as expected). So, it's
just dawned on me that I could rescue the work done on it by making the
RDF Diff and Process use and consume the changesets vocabulary instead.
The code for both should be easily portable to other languages, and is
non sparql dependent (currently written in PHP) - this could be one
solution - diff(oldGraph,newGraph).

This isn't a definitive list, but they are some approaches.

> - In the current version of the guide you refer to HTTP caching and memento for resource-level changes. To be honest, I don't get the point here. Can you explain?

Granular Linked Data Documents we publish should be HTTP friendly, this
would indicate using HTTP Cache headers (Last-Modified, ETag etc) - thus
by simply doing a HEAD or a conditional GET you can tell if a resource
has changed, or only GET it when it has changed (If-Match,
If-Modified-Since).

*This may not be what Michael was referring too

Best,

Nathan

Leigh Dodds

unread,
May 14, 2010, 8:07:00 AM5/14/10
to dataset-...@googlegroups.com
Hi,

On 14 May 2010 11:59, Nathan <nat...@webr3.org> wrote:
> With regards the above, I made an effort to cover RDF diff and patch a
> few months ago with GUO[1] - I've since come to realise (primarily
> thanks to Henry Story) that the Ontology and way it works can't be used
> and create invalid statements in the updates (even though the diff's are
> correct, and the graph produced after patching is as expected). So, it's
> just dawned on me that I could rescue the work done on it by making the
> RDF Diff and Process use and consume the changesets vocabulary instead.
> The code for both should be easily portable to other languages, and is
> non sparql dependent (currently written in PHP) - this could be one
> solution - diff(oldGraph,newGraph).

FYI: there's already open source PHP code for generating and
submitting changesets in Moriaty (http://code.google.com/p/moriarty)
why not build on, or add to that, to implement the server side?

There's quite a few projects using the Changesets vocab for exchanging
& storing diffs.

Cheers,

L.

Michael Hausenblas

unread,
May 15, 2010, 4:43:45 AM5/15/10
to dady

Bernhard,

> Great Michael, thanks ... for putting together the memo :-) Here are my
> comments:

Thanks a bunch for the detailed comments!

> - One outcome of the discussions in Raleigh was that the notification of
> triple-level changes introduces quite some complexity for data providers
> because they need to maintain the history of triple level-changes until a
> certain point in the past. Keeping track of resource-level (entity-level)
> changes is less complex and has successfully been implemented by other
> protocols that deal with changes in data sets, such as the OAI-PMH protocol
> [1]. So if we want to keep the protocol simple and stupid we should take this
> into account. I know we already discussed that several times...but I want to
> raise that point again.

I tried to formulate an issue based on this comment but failed. May I ask
you to give it a try at [1]?

> - In the current version of the guide you refer to HTTP caching and memento
> for resource-level changes. To be honest, I don't get the point here. Can you
> explain?

Imagine a case where one deals with 10 resources from Dbpedia
(resource-level changes). Would I advise to use Dataset Dynamics? Certainly
not. One can do this with the mentioned technologies. Now, the case where
one uses several thousands resources, for example based on certain
categories in Wikipedia (dataset-level changes) - here it would make sense
to use the proposed mechanism.

Ok, or still need for clarification?

> - Thanks for the evset:ResourceChangeEvent hint. We will reconsider/fix this.

Cool, thanks.

> - for the update frequencies we should quantifiable values, i.e. numbers;

Is this covered with Issue 1 [2]? If not, can you add there?

> - for the client it is important to know from which datestamp on a data
> provider records data set changes. See the OAI-PMH earliestDatestamp element
> (http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify)

I agree and proposed to use atom:updated [3], see Code 2. Maybe I should
make this more explicit?

> - a dady:lastUpdate (as subproperty of http://purl.org/dc/terms/modified) is
> for the client the most valuable and for the data provider the easiest to
> implement meta-information it can offer

Not sure why this is needed. If you look at Fig 1. Dataset Dynamics overall
process there are essentially two metadata types involved: the
time-oblivious, statis voiD+dady and the time-sensitive, dynamic Atom+CS.
Where in this setup would you see the dady:lastUpdate taken place?

> - should we also consider the option to deliver the changesets embedded into
> the the atom feed? That would save lot's of HTTP requests...

Hm. Unsure about this. I guess only implementation experience can tell. I'd
typically expect the CS (as of Code 3.) to be quite large; maybe thousands
of triples in many resources would change. Delivering them by-value rather
than by-reference would indefinitely bloat the Atom feed and make it
unusable. A consumer should further first be able to understand if a certain
change is relevant for its case and not be forced to consume the entire big
feed.

Again, if you think this is a valuable requirement, please raise and issue
at [1].

> - any idea how we can get rid of SOME TITLE and SOME SUMMARY? We do not really
> need that info, right?

This is intentional. Rather than having this metadata in the CS, I propose
to have it in the Atom feed (same as with atom:updated vs cs:createdDate).


Cheers,
Michael

[1] http://code.google.com/p/dady/issues/list
[2] http://code.google.com/p/dady/issues/detail?id=1
[3] http://tools.ietf.org/html/rfc4287#section-4.2.15

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Bernhard Haslhofer <bernhard....@univie.ac.at>
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Fri, 14 May 2010 12:47:20 +0200
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Michael Hausenblas

unread,
May 15, 2010, 4:58:17 AM5/15/10
to dady

Nathan,

>> * Granularity of change notifications:

> This is quite an important issue, 3 different paths to this are:
>
> 1- Delta-V engine which Kingsley has mentioned

*Sigh*

The mysterious delta engine, again. Seems you know more, already - and I
have to wait till Kingsley finally announces it, I guess :D

> 2- Actually updating resources via changset(s), then you have the
> changes in the format needed.

Yes.

> 3- I think I can opensource something for this :)
> With regards the above, I made an effort to cover RDF diff and patch a
> few months ago with GUO[1] - I've since come to realise (primarily
> thanks to Henry Story) that the Ontology and way it works can't be used
> and create invalid statements in the updates (even though the diff's are
> correct, and the graph produced after patching is as expected).

Glad you figured. That's one of the reasons why I proposed CS.

> Granular Linked Data Documents we publish should be HTTP friendly, this
> would indicate using HTTP Cache headers (Last-Modified, ETag etc) - thus
> by simply doing a HEAD or a conditional GET you can tell if a resource
> has changed, or only GET it when it has changed (If-Match,
> If-Modified-Since).
>
> *This may not be what Michael was referring too

Essentially, yes. As I tried to explain in my answer to Bernhard, earlier,
as long as one is interested in changes of a small set of resources, the
known mechanisms (HTTP caching, etc.), as outlined by you Nathan, are
sufficient. However, keeping track of a big number of resources (now, the
tricky question is: what is 'a big number', really? Again, only
implementation experience can/will tell) can, IMO, not efficiently done by
the mentioned, available solutions. Hence the need for a Dataset Dynamics
solution.

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Nathan <nat...@webr3.org>
> Organization: webr3
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Fri, 14 May 2010 11:59:48 +0100
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Michael Hausenblas

unread,
May 15, 2010, 5:00:15 AM5/15/10
to dady

Leigh,

> FYI: there's already open source PHP code for generating and
> submitting changesets in Moriaty (http://code.google.com/p/moriarty)
> why not build on, or add to that, to implement the server side?

+1 ... any takers?

> There's quite a few projects using the Changesets vocab for exchanging
> & storing diffs.

Could you please point me to a place that lists them? I failed to find such
an overview, but would like to include it in the DD guide, as a sort of
back-up or justification for CS.

Cheers,
Michael

--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Leigh Dodds <leigh...@talis.com>
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Fri, 14 May 2010 13:07:00 +0100
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>

Nathan

unread,
May 15, 2010, 6:39:44 AM5/15/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Nathan,
>
>>> * Granularity of change notifications:
>
>> This is quite an important issue, 3 different paths to this are:
>>
>> 1- Delta-V engine which Kingsley has mentioned
>
> *Sigh*
>
> The mysterious delta engine, again. Seems you know more, already - and I
> have to wait till Kingsley finally announces it, I guess :D

sadly not, that's all I know of it thus far - other that Ivan has
definitely been doing quite a lot of work on it for some time.

>> 2- Actually updating resources via changset(s), then you have the
>> changes in the format needed.
>
> Yes.
>
>> 3- I think I can opensource something for this :)
>> With regards the above, I made an effort to cover RDF diff and patch a
>> few months ago with GUO[1] - I've since come to realise (primarily
>> thanks to Henry Story) that the Ontology and way it works can't be used
>> and create invalid statements in the updates (even though the diff's are
>> correct, and the graph produced after patching is as expected).
>
> Glad you figured. That's one of the reasons why I proposed CS.

Wasn't so much figured it, as was told, in many ways, by Henry until I
finally got it :p

>> Granular Linked Data Documents we publish should be HTTP friendly, this
>> would indicate using HTTP Cache headers (Last-Modified, ETag etc) - thus
>> by simply doing a HEAD or a conditional GET you can tell if a resource
>> has changed, or only GET it when it has changed (If-Match,
>> If-Modified-Since).
>>
>> *This may not be what Michael was referring too
>
> Essentially, yes. As I tried to explain in my answer to Bernhard, earlier,
> as long as one is interested in changes of a small set of resources, the
> known mechanisms (HTTP caching, etc.), as outlined by you Nathan, are
> sufficient. However, keeping track of a big number of resources (now, the
> tricky question is: what is 'a big number', really? Again, only
> implementation experience can/will tell) can, IMO, not efficiently done by
> the mentioned, available solutions. Hence the need for a Dataset Dynamics
> solution.

concurred :)

Nathan

unread,
May 15, 2010, 6:47:43 AM5/15/10
to dataset-...@googlegroups.com
Michael Hausenblas wrote:
> Leigh,
>
>> FYI: there's already open source PHP code for generating and
>> submitting changesets in Moriaty (http://code.google.com/p/moriarty)
>> why not build on, or add to that, to implement the server side?
>
> +1 ... any takers?

Yup, I've had a look through and I think the code would fall under
SimpleGraph functionality and take a slight bit of refactoring.

Only vague issue is I *think* my code is PHP 5 dependent and Moriaty is
PHP4 - but a more in-depth look, or indeed just doing it will soon clear
this up!

>> There's quite a few projects using the Changesets vocab for exchanging
>> & storing diffs.
>
> Could you please point me to a place that lists them? I failed to find such
> an overview, but would like to include it in the DD guide, as a sort of
> back-up or justification for CS.

+1 would love that list, from what I can tell though, and after looking
in to this issue quite a lot, at it's simplest, reification is needed
unless using quads, and whatever approach you take to using reification
for updates will undoubtedly come out with something that is the same as
changesets, even if under a different name - so changesets seems like
the logical choice & does the job (imho) - certainly until such a time
that we have a common serialization for RDF that includes named graphs.

Best,

Nathan

Kingsley Idehen

unread,
May 15, 2010, 11:41:20 AM5/15/10
to dataset-...@googlegroups.com
Michael,

Big is on the scale of DBpedia (esp. the Live Edition) or LOD Cloud
Cache :-)

Michael Hausenblas

unread,
May 16, 2010, 4:34:05 AM5/16/10
to dady
Nathan,

>>> FYI: there's already open source PHP code for generating and
>>> submitting changesets in Moriaty (http://code.google.com/p/moriarty)
>>> why not build on, or add to that, to implement the server side?
>>
>> +1 ... any takers?
>
> Yup, I've had a look through and I think the code would fall under
> SimpleGraph functionality and take a slight bit of refactoring.

That would be brilliant, yes! We might want to chat a bit about this. I had
a sort of testbed in mind, as follows:

Set up a 'hub' at [1] that offers dataset-level changes of LOD datasets,
starting with DBpedia. That is, use Wikipedia's original change feed [2],
possibly along with the Wikipedia API [3]. Nick has recently done something
into this direction with dbpedialite [4], something we might be able to
build-upon as well.

Cheers,
Michael

[1] http://updates.linkedopendata.org
[2]
http://en.wikipedia.org/w/index.php?title=Special:Recentchanges&feed=atom
[3] http://en.wikipedia.org/w/api.php
[4] http://github.com/njh/dbpedialite


--
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Nathan <nat...@webr3.org>
> Organization: webr3
> Reply-To: dady <dataset-...@googlegroups.com>
> Date: Sat, 15 May 2010 11:47:43 +0100
> To: dady <dataset-...@googlegroups.com>
> Subject: Re: [dady] Call for review: Dataset Dynamics Guide
>
Reply all
Reply to author
Forward
0 new messages