Re: [NISO resourcesync_d2d] Review comments (was: ResourceSync version 0.6)

Skip to first unread message

Martin Klein

Jun 4, 2013, 1:27:28 PM6/4/13
to Graham Klyne,
Hi Graham,

Thx a lot for your feedback. Pls see the LANL comments inline. 
We are currently working on version 0.9 of the spec and are planning to release it within the month. Comments not specifically addressed below will the fixed/clarified in v0.9.


## Resource dump and change dump manifests (Ex. 2.4, sect 2.2.1, 5.1.1, 7.1.1, etc)

At several points, I found myself wondering if the content-type of the resource should be mandatory here.

Like Simeon, we think it would be too restrictive to mandate the attribute (e.g., what is a Source to do for resources that have no IANA defined Media-Type?) but we do think a strong recommendation to include the attribute is reasonable.

## 2.2.1. Source Perspective

Para 6, "Linking to Related Resources":  I was thinking how and where this linking would be applied.  I think a forward reference to section 8 would help here.

General: I'm not seeing any discussion of resource subsets.  IIRC, these show up clearly in the uses cases, and I thought they were mentioned in the Michigan meeting.  Section 9.1 has a general allusion to the notion of different subsets of resources, but I'm not seeing any clear guidance how this is done.  I think it's handled through multiple capability lists, which begs some questions about how discovery might work.  (I can guess at how I think it's meant to happen, but I think the spec should be clear than that.)

The notion of sets of resources and multiple Capability Lists each representing the offered capabilities per one set of resources will be one of the major changes in v0.9 of the spec. 

## 2.2.2. Destination Perspective

General comment, about timing and synchronization:  I think some discussion of timing and possible optimisations (especially based on change lists and change dumps) would be helpful here.  When can a destination pass over a particular set of changes.  What information should it keep about past synchronisation operations?  Is it always required to scan any change list that it finds, or are there any outer-level indicators that can be used to safely ignore them?

While important, we feel that this is more on the implementation side and should in detail be discussed in a separate document.

## 2.2.4. Discovery Perspective

I think a forward reference to section 9 would be helpful.

Will be added in v0.9.

## 3. Sitemap Document Formats

In light of Richard's comments, is it worth underscoring that the attributes are presented without namespace prefixes?

Will be added in v0.9. 
Also, as per our discussion at the last in-person meeting, v0.9 will detail the use of the @from and @until of the <rs:md> element (as the child element to <urlset>). 

Para 3: <url>/<rs:md>, "capability" attribute:  I think this can appear only on a top-level <urlset> or <sitemapindex> element, but the text kind-of suggests it can appear anywhere.  Saying "When the attribute is not used, this signifies that the resource is subject to synchronization" seems to create a kind of "post hoc ergo propter hoc" kind of relation, which seems a bit confusing to me. Suggest just specify where it may appear.

Will be more clear in v0.9. 

Para 3: <url>/<rs:md>, "hash" attribute:  if the source supports content negotiation for this resource, what does the hash refer to?  (suggest: the representation returned when no Accept: header is specified).

We think that is addressed in Sec 3 and Table 3.2 - the hash value is based on a particular resource representation. If the URI provided in <loc> does not identify a particular representation, @hash should not be used.

Para 3: <url>/<rs:md>, "path" attribute:  saying it "conveys the file path of..." seems a bit unclear to me, in that it seems to refer to a file in the host file system, whose format would depend on the system.  I think a description that is more clearly system neutral would be more helpful to ensure interop.  A particular case I was thinking about was:  what if a path segment contains a "/" character - how would that be represented?  When using ResourceSync to create a mirror of a web site (which needs to allow relative references to work properly), is it safe to use the path value to construct a URI for a mirrored resource, or should that be derived from the <loc> element?

Good point, we had not considered this before. Are you aware of a reference document that we could point to? How does SWORD, BagIt etc address this issue?

## 8.1. Mirrored Content

General: is a mirror required to always deliver identical content to the original source?  I.e. is it safe to assume that the rs:md@hash for the original resource also applies to the mirror?  If not, should there be some way to give a different hash for the mirror?

There is obviously no guarantee that the content and the attributes are identical but they are expected to be. For the ResourceSync spec this assumption seems sufficient.

## 8.2. Alternate Representations

" •     A recommended type attribute that conveys the Media Type of the alternate representation."

Should there also be a way to indicate other HTTP headers that can affect the result returned (i.e. values for other headers mentioned in an HTTP response 'Vary:' header)?  Language and device type have been mentioned.  I don't think the specification should cover this in detail, but I think this might be a candidate for using an extension point, such as additional rs:md attributes.

We decided against using HTTP headers. However, the spec allows for additional, properly "namespaced" attributes.

## 8.3. Patching Content

How can a destination be sure that a patch is applicable to a particular version of a resource that it has; I think a way to specify the hash of the resource representation to which the patch applies would be appropriate.  (I think the hashes expressible here are the has pif the resulting resource and the hash of the patch itself.)

This also might rather be subject for the separate document mentioned above.

## 8.5. Prior Versions of Resources

"A second approach consists of pointing to a TimeGate associated with the time-generic resource. A TimeGate supports negotiation in the datetime dimension, as introduced in the Memento protocol [Memento Internet Draft], ...".

The referenced document here, like all IETF Internet Drafts, says: "Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress".

As such, it seems to me that this is an inappropriate citation in a document that is in the latter throes of becoming a standard.  Do you not have a more permanent specification for Memento?  (I'd suggest requesting Informational RFC publication for the current memento draft, through the RFC editor independent submissions stream.  This does not preclude a later move for standard status when you're ready to do the IETF last-call dance.  Or you could just try for non-WG standard status through the IETF.  But I'm not sure if either if these would happen on a suitable timeframe for ResourceSync approval as a NISO standard.)

v0.9 will refer to the Memento ID hosted at

## 8.7. Republishing Resources

Provenance is mentioned in the motivation for this - I can't help wondering if a W3C Provenance vocabulary URI might be more appropriate than a new link relation; e.g. prov:wasDerivedFrom or prov:wasQuotedFrom (the latter seems a bit odd, but I think the use here falls within the defined meaning).

We currently use the link relation "via" which is defined in RFC 4287. We took some time to look at the prov link relations but we don't feel they are more suitable than "via" for the given context.

That's all from me!  Nice job!


Reply all
Reply to author
0 new messages