Feedback from ResourceSync tutorial at OAI8

31 views
Skip to first unread message

Herbert Van de Sompel

unread,
Jun 19, 2013, 10:32:12 AM6/19/13
to resour...@googlegroups.com
Hi all,

Rob Sanderson, Richard Jones, and I just gave the first ResourceSync tutorial at OAI8 in Geneva. We think it went really well. I would like to thank Martin Klein for the enormous amount of work he put into getting a slide deck together for us. And many thanks to my co-presenters for a great job!

I would like to share these comments/observations:

* A need was expressed for the ability for a Destination to define a "set of resources". Think searches that define a set of resources. Our thoughts:
-- RS Hard to do interoperable search semantics
-- HVdS Resulting load not predictable
-- RJ Use metadata about resource subject to synchronization (e.g. describedby) and client side filtering instead
-- HVdS Should put something to illustrate the RJ approach in the spec

* Provide a link to the Dump Manifest in the Resource Dump per ZIP file. Use "resourcedump-manifest" as rel type. Similar in Change Dump. Avoids unnecessary downloads of ZIPs

* Can use a link with a "profile" relation type to express the nature of the metadata (eg expressed as XML Namespace of metadata format) when the resource to be synchronized is a metadata record 

* Need an overview of attributes and link relation types in the slides

* from/until note from Rob
-- RS What is the until of most recent changelist when publication strategy is publish ever 100 changes, eg unknown schedule?  Must be given, as it's not a snapshot but don't know end point?

* people would love to have recommendations re when to implement which capability

* people are intrigued re the relationship with PMH, ORE

Herbert


Sent from my iPad

Simeon Warner

unread,
Jun 26, 2013, 10:53:08 AM6/26/13
to resour...@googlegroups.com
On 6/19/13 10:32 AM, Herbert Van de Sompel wrote:
> * from/until note from Rob
> -- RS What is the until of most recent changelist when publication
> strategy is publish ever 100 changes, eg unknown schedule? Must be
> given, as it's not a snapshot but don't know end point?

I think the until date at any time the changelist is written/updated
would be "now".

Cheers,
Simeon

Robert Sanderson

unread,
Jun 27, 2013, 12:25:44 PM6/27/13
to Simeon Warner, resour...@googlegroups.com
Hi Simeon,

So every time you update the changelist doc, you also update the until date on it (and in the index/capability list if it's there).

My concern is that the client then doesn't know that there isn't another changelist after that date and will always have to go and look for it unnecessarily when if we had a different solution, that wouldn't be necessary.

Rob






--
You received this message because you are subscribed to the Google Groups "ResourceSync" group.
To unsubscribe from this group and stop receiving emails from it, send an email to resourcesync+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Herbert Van de Sompel

unread,
Jun 27, 2013, 1:30:16 PM6/27/13
to Robert Sanderson, Simeon Warner, resour...@googlegroups.com
On Jun 27, 2013, at 18:25, Robert Sanderson <azar...@gmail.com> wrote:

Hi Simeon,

So every time you update the changelist doc, you also update the until date on it (and in the index/capability list if it's there).

My concern is that the client then doesn't know that there isn't another changelist after that date and will always have to go and look for it unnecessarily when if we had a different solution, that wouldn't be necessary.

Would be good to make more explicit which different solution you have in mind. Are you referring to the prev/next pointers that we used to have to page between Change Lists?

At a more general level, I think that we have introduced a significant implementation cost for the Source with the from/until approach. I understand we did it to address Graham's concern regarding guaranteeing "completeness" of Change Lists and Change Dumps towards Destinations. The cause is good, the price really high.

Herbert



Rob




On Wed, Jun 26, 2013 at 8:53 AM, Simeon Warner <simeon...@cornell.edu> wrote:
On 6/19/13 10:32 AM, Herbert Van de Sompel wrote:
* from/until note from Rob
-- RS What is the until of most recent changelist when publication
strategy is publish ever 100 changes, eg unknown schedule?  Must be
given, as it's not a snapshot but don't know end point?

I think the until date at any time the changelist is written/updated would be "now".

Cheers,
Simeon


--
You received this message because you are subscribed to the Google Groups "ResourceSync" group.
To unsubscribe from this group and stop receiving emails from it, send an email to resourcesync+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "ResourceSync" group.
To unsubscribe from this group and stop receiving emails from it, send an email to resourcesync...@googlegroups.com.

Martin Klein

unread,
Jun 27, 2013, 5:33:45 PM6/27/13
to resour...@googlegroups.com
fwd'ing Graham's msg to the group


---------- Forwarded message ----------
From: Graham Klyne <gkl...@googlemail.com>
Date: Thu, Jun 27, 2013 at 3:18 PM
Subject: Re: [resourcesync] Feedback from ResourceSync tutorial at OAI8
To: Herbert Van de Sompel <hvd...@gmail.com>, Robert Sanderson <azar...@gmail.com>
Cc: Simeon Warner <simeon...@cornell.edu>, "resour...@googlegroups.com" <resour...@googlegroups.com>


Hmm.. do you really expect to update documents dynamically? If nothing else, I see that interacting badly with HTTP caching.

Generally, I think there's always come trade-off between sender convenience/efficiency and receiver convenience/efficiency. I think that if the semantics are clear then there's score for some off the trade-off to be determined on-air-fly. One "cost" that might be borne as part of this trade-off is some unnecessary duplication of operations.

My model for a sender is that it collects data for a change list add the changes are made, and at the point of generating a change list it uses the current time as its "until" value. Depending on how it's implemented, changes occurring street at point might still make it into the change list. The index could then be generated once ask the change lists have been generated with a timestamp of thev earliest "until" value. Similar logic could apply for a full resource list. The wrinkle here is that if changes after the initial timestamp make their way into a list, they must still be retained for the next round, or there could be apparent or real gaps in the record.

Different systems could sit different strategies, depending on their local situation - the standard should be able to accommodate as many approaches as possible.

#g.

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


Reply all
Reply to author
Forward
0 new messages