ReSync call summary 01/23/2013

23 views
Skip to first unread message

Martin Klein

unread,
Jan 23, 2013, 3:12:01 PM1/23/13
to resour...@googlegroups.com
Hi all,

pls see below a summary of the items we discussed today and the
decision we made:

7.0) rel type "preferred"
- RFC 6249 adoption: rel="duplicate" pref="true" vs rel="duplicate"
- RFC 6249 adoption: rel="duplicate" pri="1" vs rel="duplicate"
pri="2" vs rel="duplicate"
- only for mirrors, losing sense of generality as we don't have
preferred for other rel types such as timegate
- possible to inherit for other rel types
decision: in favor of "pri" attribute, don't use preference
attribute, also use "pri" if needed for other rel types

7) problem of parallel synchronization of "resource" and "metadata
about the resource"
- introduces need to link bidirectionally between the two
- what rel type to use?
decision: use rel types: "describes" and "isdescribedby"
http://tools.ietf.org/html/draft-wilde-describes-link-02
http://www.w3.org/TR/powder-dr/
- will be reflected in new subsection in section 8

8) idea for a notification framework spec/document, potentially based
on push technology
- could include former section 8: Pushing Changes
- to be discussed later

9) notion of collection consistency across capabilities
- does the set of resources exposed via each capability have to be
the same for each capability?
- if not, Destination needs to consume all capabilities for Audit
decision: yes, the set has to be the same
- mention in 2.2.3 and be explicit in section 10

10) is the ZIP format the recommended or THE ONLY format
- if only recommended, meaning more than one are possible, should we
include the "type" attribute in link
decision: ONLY ONE packaging format at a time, in a capability list
- mandate "type" attribute
- ZIP is the recommended format (tar.* possible)

11) Resource List terminology
- proposed "complete Resource List" (sitemapindex + urlset)
decision: Simeon will rephrase section 4
- terminology for grouping into index, proposals: paginate, split
into manageable chunks
decision: Simeon will rephrase section 4

12) do we need/want to register our relation type to avoid the full
URI in sections 10.3.2 and 10.3.3
- if yes, can we do it with this spec
- would also be a candidate to replace rel="top" which is not defined anyways
- proposed: rel="resourcesync"
- proposed: rel="service" as defined in RFC 5023
http://tools.ietf.org/html/rfc5023
- other alternatives: "content", "index"
decision: use rel type "resourcesync"
- can only point to capability list that pertains to the resource
that provides the link
- if there is a capability list index, client has to use the "up"
link from the capability list to get there
- capability list is not mandatory but very strongly recommended

13) ResourceSync assertions vs. HTTP assertions
- what to do when they don't match
- e.g., modified attribute != Last-Modified, hash != Content-MD5, etc
proposal (MLN): discuss and give guidance in section 3, add to table 3.2
decision: acknowledge that lastmod is special case
- point out that if fixity info diverges, system is in flux
- add to text in section 3, do not add to table 3.2
- MLN provides mapping of attributes and HTTP headers

14) should we acknowledge other sitemap extensions
- expires
- <changefreq>never</changefreq>
- "never. Use this value for archived URLs."
- http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668
decision: mention <changefreq>never</changefreq> in section 3 as an
interesting case for synchronization
- no mention of <priority>

cheers
Martin

Martin Klein

unread,
Jan 23, 2013, 4:52:07 PM1/23/13
to resour...@googlegroups.com
The rel type to be used is, of course, "describedby" and not "isdescribedby".
My apologies for the confusion!

7) problem of parallel synchronization of "resource" and "metadata
about the resource"
- introduces need to link bidirectionally between the two
- what rel type to use?
decision: use rel types: "describes" and "describedby"
m

Richard Jones

unread,
Jan 23, 2013, 5:47:14 PM1/23/13
to Martin Klein, resour...@googlegroups.com
Hi Folks,

Sorry I haven't been able to make the calls recently.

I'm starting to get ready to do some software work around the
specification, and wonder if you feel it's too early to begin that
with these discussions still going. I am going to first update the
use cases document with implementation notes, and then we plan to go
on and do a kind of OAI-PMH equivalent for DSpace. As part of the
process we'll start to develop a generic server side library in Java.

Thoughts?

Cheers,

Richard
> --
>
>



--

Richard Jones,

Founder, Cottage Labs
t: @richard_d_jones, @cottagelabs
w: http://cottagelabs.com

Herbert van de Sompel

unread,
Jan 23, 2013, 5:49:42 PM1/23/13
to Richard Jones, Martin Klein, resour...@googlegroups.com
Richard,

We intend to have a revised spec out to the tech group early next week
and public end jan, early Feb. So, you should be good to go at that
time.

Thanks

Herbert
> --
>
>



--
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/

==

Richard Jones

unread,
Jan 23, 2013, 5:51:54 PM1/23/13
to Herbert van de Sompel, Martin Klein, resour...@googlegroups.com
Hi Herbert,

Perfect, I will wait on the next version of the spec before taking on
the use cases.

Cheers,
Richard

Simeon Warner

unread,
Jan 24, 2013, 10:41:46 AM1/24/13
to resour...@googlegroups.com
Hi Herbert, Martin,

Sorry I ran out before the end of the call yesterday. I'd like to
clarify timing. I would like to have the opportunity to do a careful
reread and send suggested edits to Martin before we put this out public.
If I devote Wednesday to this and send edits by the end of Wednesday,
will that work?

Cheers,
Simeon

Martin Klein

unread,
Jan 24, 2013, 10:47:42 AM1/24/13
to Simeon Warner, resour...@googlegroups.com
Yes, that would work. I will send you the up-to-date version on Tuesday night.

m
> --
>
>

Michael Nelson

unread,
Jan 28, 2013, 9:41:38 PM1/28/13
to resour...@googlegroups.com

On Wed, 23 Jan 2013, Martin Klein wrote:

> 13) ResourceSync assertions vs. HTTP assertions
> - what to do when they don't match
> - e.g., modified attribute != Last-Modified, hash != Content-MD5, etc
> proposal (MLN): discuss and give guidance in section 3, add to table 3.2
> decision: acknowledge that lastmod is special case
> - point out that if fixity info diverges, system is in flux
> - add to text in section 3, do not add to table 3.2
> - MLN provides mapping of attributes and HTTP headers
>

(I'm not sure how this should be formatted; please adjust as you see fit)

* modified --> Last-Modified (RFC 2616)

While the value of the modified attribute should correspond to the value
in the Sitemap-defined <lastmod> element, it is possible that it might not
match the value of the HTTP header Last-Modified. This could indicate
that the Resource List or Change list is stale and the resource has
changed since either list was generated, or it could indicate that the
change reported via HTTP is not semantically important enough to warrant
an update in Resource List (see the corresponding note in the Sitemap
document).

* length --> Content-Length (RFC 2616)

The value of the length attribute should match the value of the HTTP
header Content-Length. If they do not match, then the resource has
undergone a change since the Resource List or Change List was generated.

* hash --> Digest (RFC 3230), Content-MD5 (RFC 2616)

The value of the hash attribute should correspond with the value of the
HTTP header Digest. If they do not match, then the resource has
undergone a change since the Resource List or Change List was generated.
The value of the hash attribute may correspond with the value of the HTTP
header Content-MD5, if 1) the digest algorithm is MD5, and 2) the HTTP
response is not a 206 (partial content) in which the Content-MD5 applies
to the digest of the partial response and not the entire resource.

* type --> Content-Type (RFC 2616)

The value of the type attribute should match the value of the HTTP
header Content-Type. Since it is rare for resource representations to
change their Content-Type, it is possible that described resource is
subject to content negotiation.

* etag --> ETag (RFC 2616)

The value of the etag attribute should correspond to the value of the HTTP
Header ETag. ETag values are opaque and generated by the HTTP server,
thus they may or may not represent a ResourceSync change event.

***MLN note: I've kicked around different text for etag/ETag, and I'm now
convinced this is something we could drop from the spec. Yeah, it's
defined in the Atom link extensions draft, but they're actually of no use
in Resource Dumps since we don't know how the server generated them. The
common Apache method is to base ETags on a file's inode, lastmod, and
size:

http://httpd.apache.org/docs/2.2/mod/core.html#fileetag

Between length, modified, and digest I don't think we need anything else.
ETag exists in http so you can describe changes independent of time
ordering (e.g., ETag=x and ETag=y tells you they're different, but you
can't sort the changes according to time). But that's really not the
purpose of ResourceSync.

We could probably leave it in and nothing bad would happen, but I'm not
sure it would ever get used (or used correctly, anyway).

regards,

Michael

>
> cheers
> Martin
>

----
Michael L. Nelson m...@cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)

Reply all
Reply to author
Forward
0 new messages