more feedback from development work

27 views
Skip to first unread message

Richard Jones

unread,
May 10, 2013, 10:13:33 AM5/10/13
to resour...@googlegroups.com
Hi Folks,

Continuing my series of emails with feedback from the implementation
work against DSpace:

1/ Should we have more guidance on what should be at the URL used for
the link@rel="describedby" in the Capability List? The example in the
spec points to a .xml file, but there's no further explanation as to
what might be in that file. In my implementation, I have pointed this
to a human readable web-page, as I'm not sure what else could really
be in that location. Since the example points to an xml file, it
makes me wonder what machine-readable content might be there. If
there's no specific idea, I'd recommend changing this to expressly say
that this is recommended to be human readable.

2/ In the specification there is some confusion between Change List
Index and Change List Archive. The diagrams say Index, as does some
of the text, but the section is called Change List Archive. I've
assumed that this is just a typo, that these things are actually the
same thing.

3/ We're lacking namespaces for use in XML for the "pri" and the
"encoding" element. For the moment I've put them in without a
namespace, but I have explicitly namespaced all other attributes where
their origin is clear. I realise that attributes are not affected by
the default namespace in the same way as elements, so this may not
matter.

4/ What is the updated date of a changelist? Is it the date that the
changelist was created, or is it shortly after the most recent listed
change? The reason I ask is that during initial changelist
construction, I may want to generate several, each describing - say -
a week each from the last month. Do they all have the same updated
date, or should each of them appear to have been updated near the time
period that they represent. The obvious answer is that the
changelist's updated date was the last date that it was physically
updated, but I don't see much use for this for the consumer of a
changelist, and only a mild benefit from an administrative point of
view for the generator. Perhaps the real question I'm asking is "what
information is the 'updated' date attempting to convey?"

5/ Sometimes items which were previously deleted get reinstated (this
can happen in DSpace, for example). Should ResourceSync care (i.e.
have an "undelete" change), or is a "created" change at the same url
sufficient. I'm inclined to think the latter, but wanted to bring it
up in case anyone felt differently.

6/ What should the publisher do if there is an item which is known to
have changed, but the date of that change is unknown? This is, sadly,
a real problem in DSpace, where I can tell when the parent container
of a file has been changed, and therefore that its files can have
changed, but can't guarantee that the last updated date of the parent
object is the same as the last updated date of each of the files
(since there is only one last updated date, and potentially many
files). For reasons which aren't clear, provenance on file changes in
DSpace is simply not stored in a machine-readable way! An acceptable
answer to this question would be: tough - DSpace's problem :)

Things are going well with the development, and I have published a
partial ResourceSync Java library at:
https://github.com/CottageLabs/ResourceSyncJava (it only covers the
bits of the spec that I'm actually using right now), and a DSpace
implementation at: https://github.com/CottageLabs/DSpaceResourceSync.
The DSpace implementation can now generate initial resource lists,
incremental change lists, and re-base the resource lists periodically,
and present a capability list and change list archive, consistent with
the outline design documented at:
http://cottagelabs.com/news/meeting-the-oaipmh-use-case-with-resourcesync.
More work still to be done, though.

Cheers,

Richard

--

Richard Jones,

Founder, Cottage Labs
t: @richard_d_jones, @cottagelabs
w: http://cottagelabs.com

Herbert van de Sompel

unread,
May 10, 2013, 11:31:50 AM5/10/13
to Richard Jones, resour...@googlegroups.com, Herbert van de Sompel
Richard,

One quick response to (3) below:

The attributes we list in Section 3 are all in the resourcesync namespace, they don't need to be namespaced otherwise. Their semantics have been defined in other specs, but we inherit those semantics for attributes with the same names as the original ones minted in our own namespace.  We are not the first ones to do this as a means to avoid extensive namespacing. This is explained in the following paragraph of Section 3:

The document formats, as well as their ResourceSync extension elements, are shown in Table 3.1. The <rs:md> and <rs:ln> elements are introduced to express metadata and links, respectively. Both are in the ResourceSync XML Namespace and can have attributes. The attributes defined in this namespace are listed in Table 3.2 and detailed below. The <rs:ln> element as well as several of the ResourceSync attributes are based upon other specifications and in those cases inherit the semantics defined there; the "RFC" column of Table 3.2 refers to those specifications. Communities can introduce additional attributes when needed but must use an XML Namespace other than that of ResourceSync.

BTW: It would be nice if you could send us pointers to some capability documents you are generating.

Cheers

herbert

--
You received this message because you are subscribed to the Google Groups "ResourceSync" group.
To unsubscribe from this group and stop receiving emails from it, send an email to resourcesync...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/

==

Simeon Warner

unread,
May 10, 2013, 11:53:00 AM5/10/13
to resour...@googlegroups.com
I'll take one of Richard's points for now, which resonates with some of
my throughts:

On 5/10/13 10:13 AM, Richard Jones wrote:
> 1/ Should we have more guidance on what should be at the URL used for
> the link@rel="describedby" in the Capability List? The example in the
> spec points to a .xml file, but there's no further explanation as to
> what might be in that file. In my implementation, I have pointed this
> to a human readable web-page, as I'm not sure what else could really
> be in that location. Since the example points to an xml file, it
> makes me wonder what machine-readable content might be there. If
> there's no specific idea, I'd recommend changing this to expressly say
> that this is recommended to be human readable.

I think we need to have so way to point to certain capability lists as
having specific meanings, preferably machine readable. One that seems
very obvious is the OAI-PMH replacement case: it would be good to be
able to say "this capability list is my metadata", maybe even "this
capability list if my DC metadata". Then by extension we might want to
say "the capability list is my full-text/PDF/whetever" though that very
quickly gets nebulous.

Thinking in semantic terms one would want to be able to add a class or such:

cap_list_1 a rs:metadata

or

cap_list_1 a rs:metadata_oai_dc

Might it be reasonable to use the "type" rel for this which says its
meaning is about right (iana rels list points to
http://tools.ietf.org/html/rfc6903#section-6 ) ?

Cheers,
Simeon



Richard Jones

unread,
May 13, 2013, 4:18:43 PM5/13/13
to Herbert van de Sompel, resour...@googlegroups.com
Hi Herbert,

On 10 May 2013 16:31, Herbert van de Sompel <hvd...@gmail.com> wrote:
> Richard,
>
> One quick response to (3) below:
>
> The attributes we list in Section 3 are all in the resourcesync namespace,
> they don't need to be namespaced otherwise.

Ok, I'll remove all of the namespacing of attributes in the
implementation that I've done - that will make life easier, for sure.

I've attached a couple of example documents that I've produced so far
(namespaces still included). Hopefully I'll have a publicly available
version of DSpace with RS capabilities quite soon, will let you know
when it is up.

Cheers,

Richard
capabilitylist.xml
changelistarchive.xml
resourcelist.xml

Richard Jones

unread,
May 13, 2013, 4:34:52 PM5/13/13
to Simeon Warner, resour...@googlegroups.com
Hi Simeon,

>> 1/ Should we have more guidance on what should be at the URL used for
>> the link@rel="describedby" in the Capability List? The example in the
>> spec points to a .xml file, but there's no further explanation as to
>> what might be in that file. In my implementation, I have pointed this
>> to a human readable web-page, as I'm not sure what else could really
>> be in that location. Since the example points to an xml file, it
>> makes me wonder what machine-readable content might be there. If
>> there's no specific idea, I'd recommend changing this to expressly say
>> that this is recommended to be human readable.
>
>
> I think we need to have so way to point to certain capability lists as
> having specific meanings, preferably machine readable. One that seems very
> obvious is the OAI-PMH replacement case: it would be good to be able to say
> "this capability list is my metadata", maybe even "this capability list if
> my DC metadata". Then by extension we might want to say "the capability list
> is my full-text/PDF/whetever" though that very quickly gets nebulous.

It could definitely be useful to have a way of indicating what - if
any - profile of RS an endpoint supports. But it feels like that in
itself is a whole other bit that needs to be specced out, if that's
they way we are going to do it. Perhaps best to leave it informal at
this point, and wait until there are sufficient such profiles to do
something about it?

> Thinking in semantic terms one would want to be able to add a class or such:
>
> cap_list_1 a rs:metadata
>
> or
>
> cap_list_1 a rs:metadata_oai_dc
>
> Might it be reasonable to use the "type" rel for this which says its meaning
> is about right (iana rels list points to
> http://tools.ietf.org/html/rfc6903#section-6 ) ?

That would probably make more sense that a "describedby" link in the
capability list. If you take a look at the solution we've gone for in
our PMH solution, we've put in a "describedby" link in each resource
which is a metadata resource, with the href being to the namespace of
the metadata format (see the documents attached to my previous email,
or this example).

<rs:ln atom:href="http://purl.org/dc/terms/" atom:rel="describedby" />

Do you think a "type" link instead would be better?

Cheers,

Richard


>
> Cheers,
> Simeon
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ResourceSync" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to resourcesync...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



Martin Haye

unread,
May 14, 2013, 1:55:20 PM5/14/13
to resour...@googlegroups.com
I saw in section 8.4 of the spec what appears to be a different approach to the same problem. It's a single resource list with two resources for each item -- one for the metadata, one for the PDF, and links between them marked with "describes" or "described-by". Is that the preferred way of doing what Simeon describes (and which is also a need that we have for eScholarship)?

Also a side note on section 8.4: the 'length' attributes in the example seem wonky. If they're the same PDF file, shouldn't it be the same length in both cases?

One final note: Great job on the new spec! I enjoyed reading through it and found it to be eminently clear, readable, and well focused on use cases.

--Martin

Herbert van de Sompel

unread,
May 14, 2013, 3:46:45 PM5/14/13
to Martin Haye, resour...@googlegroups.com, Herbert van de Sompel
On Tue, May 14, 2013 at 11:55 AM, Martin Haye <marti...@ucop.edu> wrote:
I saw in section 8.4 of the spec what appears to be a different approach to the same problem. It's a single resource list with two resources for each item -- one for the metadata, one for the PDF, and links between them marked with "describes" or "described-by". Is that the preferred way of doing what Simeon describes (and which is also a need that we have for eScholarship)?


There are two different things going on, and in both cases rel="describedby" is used:

(*) The case in Section 8.4, where a resource - say a PDF - that is subject to synchronization expresses a describedby relationship with another resource, which is a metadata resource that describes the PDF. This is shown for the first resource listed in Example 8.5.

(*) The case in Section 9.1 where a Capability List (which describes the capabilities of the Source) expresses a describedby relationship with another resource, which provides a description of the "set of resources" that is covered by the Capability List. This notion of "set of resources" is not yet explicit in version 0.6 but will be detailed in version 0.9. The idea is that a server can split its content up in a variety of ways: all metadata resources, all full text resources, all video resources, all peer-reviewed content, all materials in French, etc., etc. For each such a "set of resources", the Source can have a dedicated Capability List. In such a Capability List, a link with a describedby relationship is provided to explain what the "set of resources" is that is covered by the List.
 
Also a side note on section 8.4: the 'length' attributes in the example seem wonky. If they're the same PDF file, shouldn't it be the same length in both cases?

Very good catch! Also problems with "hash" and "modified". Thanks!
 

One final note: Great job on the new spec! I enjoyed reading through it and found it to be eminently clear, readable, and well focused on use cases.

That's great to hear. Thank you much!

Herbert
 

--Martin


On Friday, May 10, 2013 8:53:00 AM UTC-7, Simeon Warner wrote:

I think we need to have so way to point to certain capability lists as
having specific meanings, preferably machine readable. One that seems
very obvious is the OAI-PMH replacement case: it would be good to be
able to say "this capability list is my metadata", maybe even "this
capability list if my DC metadata". Then by extension we might want to
say "the capability list is my full-text/PDF/whetever" though that very
quickly gets nebulous.

--
You received this message because you are subscribed to the Google Groups "ResourceSync" group.
To unsubscribe from this group and stop receiving emails from it, send an email to resourcesync...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Herbert van de Sompel

unread,
May 14, 2013, 3:58:24 PM5/14/13
to Richard Jones, resour...@googlegroups.com
Richard,

Thanks! May I ask for one clarification. In the following, taken from the Resource List:

  <sm:url>

    <sm:loc>http://localhost:8080/xmlui/bitstream/123456789/7/1/ResearcherIdentifiers_TechnicalReport.pdf</sm:loc>

    <sm:changefreq>never</sm:changefreq>

    <rs:md atom:length="651281" atom:type="application/pdf" />

    <rs:ln atom:href="http://localhost:8080/dspace-resourcesync/123456789/7?format=http://purl.org/dc/terms/" atom:rel="describedby" />

    <rs:ln atom:href="http://localhost:8080/xmlui/123456789/3" atom:rel="collection" />

  </sm:url>


do you mean by:


that the PDF file listed in <loc> is described by the metadata available at http://localhost:8080/dspace-resourcesync/123456789/7?format=http://purl.org/dc/terms/

If so, that is the intended use of describedby for the case of linking a "content resource to the associated "metadata" resource.

Herbert

Richard Jones

unread,
May 14, 2013, 4:57:52 PM5/14/13
to Herbert van de Sompel, resour...@googlegroups.com
Hi Herbert,

On 14 May 2013 20:58, Herbert van de Sompel <hvd...@gmail.com> wrote:
> Richard,
>
> Thanks! May I ask for one clarification. In the following, taken from the
> Resource List:
>
> <sm:url>
>
>
> <sm:loc>http://localhost:8080/xmlui/bitstream/123456789/7/1/ResearcherIdentifiers_TechnicalReport.pdf</sm:loc>
>
> <sm:changefreq>never</sm:changefreq>
>
> <rs:md atom:length="651281" atom:type="application/pdf" />
>
> <rs:ln
> atom:href="http://localhost:8080/dspace-resourcesync/123456789/7?format=http://purl.org/dc/terms/"
> atom:rel="describedby" />
>
> <rs:ln atom:href="http://localhost:8080/xmlui/123456789/3"
> atom:rel="collection" />
>
> </sm:url>
>
>
> do you mean by:
>
> <rs:ln
> atom:href="http://localhost:8080/dspace-resourcesync/123456789/7?format=http://purl.org/dc/terms/"
> atom:rel="describedby" />
>
> that the PDF file listed in <loc> is described by the metadata available at
> http://localhost:8080/dspace-resourcesync/123456789/7?format=http://purl.org/dc/terms/
>
> If so, that is the intended use of describedby for the case of linking a
> "content resource to the associated "metadata" resource.

Yup, that's right, that's the way we intended it. You should see
reciprocal rs:ln@rel="describes" links in the correlating metadata
resource <url> element.

The metadata record urls are a bit awkward, because DSpace doesn't
have actual file resources for each item - we can just generate the
resources in whatever formats we want provided we can write the
crosswalks, so hence the format=xxxxx url query param. I'm still
having an internal debate over whether that's the best way to do the
urls or not.

Cheers,

Richard

Graham Klyne

unread,
May 30, 2013, 7:24:51 AM5/30/13
to Richard Jones, Simeon Warner, resour...@googlegroups.com
On 13/05/2013 21:34, Richard Jones wrote:
> It could definitely be useful to have a way of indicating what - if any -
> profile of RS an endpoint supports. But it feels like that in itself is a
> whole other bit that needs to be specced out, if that's they way we are going
> to do it. Perhaps best to leave it informal at this point, and wait until
> there are sufficient such profiles to do something about it?
Without having immediate grasp of the details, I thought this is what the
capability list is intended to do (among other things).

(I'm still reloading my mental context for this work, following an extended absence,
so I may well misunderstand the comment.)

#g
--

Reply all
Reply to author
Forward
0 new messages