ever changing content in change lists

Skip to first unread message

Richard Jones

Mar 27, 2013, 3:26:47 PM3/27/13
to resour...@googlegroups.com
Hi Folks,

I hope the meeting last week was good - I haven't had a chance to
catch up yet, but I will do so as soon as possible. Apologies again
for not being able to attend.

I'm currently working on our implementation for DSpace of ResourceSync
supporting metadata synchronisation (the OAI-PMH use case - but
without attempting to emulate the actual OAI-PMH functionality). We
are confident that with the addition if isPartOf as a valid link rel,
we can provide a very effective solution entirely within the RS

In terms of implementation itself, I'm wrestling with a slightly
tricky conundrum and wonder if anyone can give me some insights.

I need to produce a Change List (or possibly a succession of - say -
weekly change lists) which lists both metadata and content resources.
Given that metadata and files are not versioned in DSpace, when the
metadata of a resource is updated, the previous metadata is no longer
available. The RS spec says that a Change List should (may?) contain
all of the changes that happened to a resource. So if my metadata is
updated twice, I would expect two entries in the Change List, one for
each metadata update.

A problem arises because DSpace metadata is stored in a database, and
the obvious way to provide a metadata resource is to generate it on
the fly upon request, meaning that only the most recent version of the
metadata is ever available. In order to service a Change List which
requires all interim metadata records to be available, I would have to
serialise and store each version of the metadata each time the DSpace
metadata is updated; this is do-able, incidentally, but more work and
more complexity and more storage space.

The OAI-PMH behaviour in this situation is to list items that have
changed since some requested date and provide their /latest/ metadata,
so in terms of meeting its use case, that's all I really need to
achieve too (although doing the full treatment might be desirable in
the long run). I am having trouble visualising the Change List,
though, which provides this, as well as understanding what the impact
of this is on historical change lists.

Suppose I want to produce weekly change lists, so a client can come
back to me after 3 weeks and catch up on the changes in that period by
consuming the last 3 change lists. In this set of changes is a
metadata record which changed twice. When the client comes across the
first change (assuming that it starts from the oldest record and works
its way forward), it will come across a URL for a metadata record
which contains one of three things:

1/ a 404 - this version of the metadata record has been lost to the
mists of time
2/ a metadata record as it was on that date
3/ the latest metadata record

Which is correct/best? And considering that (2) might not be a
practical option, which of (1) and (3) is best? I'm currently leaning
towards (1) as (3) might be misleading.

Or: is there another way that I can assemble my Change Lists such that
this issue is avoided? I don't want to have to go back and edit
historical change lists, since this would not only be time consuming
but also not in the spirit of ResourceSync, I feel. Is there some
other solution that I'm overlooking, such as limiting myself to only
one change list for a short time period (since the last resource list
was generated would be sensible), and keeping it de-duplicated (which
would be easier than modifying historical change lists, and slightly
less distasteful)?

Similar problems arise when talking about content files, although they
tend to change less, and deletion is probably the primary thing that I
need to deal with.

Interested in your thoughts.




Richard Jones,

Founder, Cottage Labs
t: @richard_d_jones, @cottagelabs
w: http://cottagelabs.com
Reply all
Reply to author
0 new messages