Re: [resourcesync] ResourceSync call tomorrow 12.30pm EST

33 views
Skip to first unread message

Michael Nelson

unread,
Jan 23, 2013, 12:00:03 PM1/23/13
to resour...@googlegroups.com

All,

Simeon and I have been discussing the rel="preferred" issue. We're not
sure how to proceed so we're throwing this back to the group. Here are
the two main options as we see them:

* Option #1 -- define "preferred" as a subclass of "duplicate"

rel="duplicate" is defined in RFC 6249:

http://tools.ietf.org/html/rfc6249

o Relation Name: duplicate

o Description: Refers to a resource whose available representations
are byte-for-byte identical with the corresponding representations
of the context IRI.

see below for some text that Simeon wrote for this option.

good: you can unambiguously state a preference for:

<rs:ln rel="preferred"
href="http://mirror1.example.com/res1"/>

<rs:ln rel="duplicate"
href="gsiftp://gridftp.example.com/res1"/>

while still conveying that these are byte-equivalent.

bad: you can't say this:

<rs:ln rel="preferred"
href="http://mementoproxy.lanl.gov/timegate/http://example.com/res1"/>

<rs:ln rel="timegate"
href="http://mementoproxy.cs.odu.edu/timegate/http://example.com/res1"/>

because while these timegates should give you nearly the same answer, in
practice they'll give you slightly different answers (byte-equivalency is
not really even defined for this kind of statement).

* Option #2 -- have a squishy, vague defn of rel="preferred"

having read lots of RFCs, I've come to appreciate the art of being both
clear and leaving lots of wiggle room. some starter text to this effect:

a server recommendation about a resource with preferential status from a
list of similar, possibly identical, resources. if multiple preferences
are indicated, the client should explore the relations further with
respect to content-type, content-size, other relation types, etc.

this would effectively dodge the question of "preferred what?" when you
see a single rel="preferred" statement (you can't really assume that it is
a "preferred alternate" or anything like that; it is simple "preferred".
preferred what? just "preferred".)

good: we'd allow general preference statements:

rel="preferred duplicate"
rel="preferred timegate"

bad: some assembly required by the client.

techincally, the above statements would expand into something like:

r1 duplicate r2
r1 preferred r2
r1 duplicate r3
r1 timegate r4
r1 preferred r4
r1 timegate r5

leaving the client to do a join on the href values to determine that r2 is
the preferred duplicate and r4 is the preferred timegate.

* Option #3 -- have single, hyphenated rel types:

rel="preferred-duplicate"

good: unambiguous. narrow scope; let memento worry about
"preferred-timegate", etc.

bad: unambiguous. narrow scope; now memento has to worry about
"preferred-timegate", etc.


regards,

Michael


===============preferred as sub-class of duplicate=================

- rel="preferred" specifies a preferred URI from which a client may
download the same content. Once specified,
applications such as search engines should download content from the
preferred URI.

- rel="preferred" is orthogonal to rel="canonical" [RFC 6596] which
specifies the canonical IRI for indexing, and
not the preferred download location.

- If there are multiple rel="preferred" links from a single resource,
these are considered to be equally preferred
and a client may pick any one. (Large scale clients might select at random
to distribute load between servers.)

- Like rel="duplicate" [RFC 6249], rel="preferred" refers to a resource
whose available representation are
byte-for-byte identical with the corresponding representations of the
context IRI.

- rel="preferred" has the meaning of rel="duplicate" [RFC 6249] with the
optional "pref" attribute.

so we might have something like the following straw man:


IANA Considerations

o Relation Name: preferred

o Description: Refers to a resource whose available representations are
byte-for-byte identical with the
corresponding representations of the context IRI, and is a preferred
download location that should be used to
obtain representations of the context IRI.

o Notes:

1. The preferred relation implies the semantics of rel="duplicate" [RFC
6249] and adds additional meaning. It
*is/is-not*(???) recommended that rel="duplicate" is also specified (I
lean toward recommending having both since
that has nicer dump-down).

2. This relation is for static resources. That is, an HTTP GET request on
any duplicate will return the same
representation. It does not make sense for dynamic or POSTable resources
and should not be used for them.

3. This relation is orthogonal to rel="canonical" [RFC 6596] which
specifies the canonical IRI for indexing. In
fact it is likely that in many situations the canonical IRI will not be
the preferred download location.






On Tue, 22 Jan 2013, Martin Klein wrote:

> Hi all,
>
> Just a friendly reminder about our call tomorrow at 12.30pm EST.
>
> Below is a list of issues that we have not yet covered and hence are,
> from LANL point of view, subject to be discussed tomorrow:
>
> 7) problem of parallel synchronization of "resource" and "metadata
> about the resource"
> - introduces need to link bidirectionally between the two
> - what rel type to use?
>
> 8) idea for a notification framework spec/document, potentially based
> on push technology
> - could include former section 8: Pushing Changes
>
> 9) notion of collection consistency across capabilities
> - does the set of resources exposed via each capability have to be
> the same for each capability?
> - if not, Destination needs to consume all capabilities for Audit
>
> 10) is the ZIP format the recommended or THE ONLY format
> - if only recommended, meaning more than one are possible, should we
> include the "type" attribute in link
>
> 11) Resource List terminology
> - proposed "complete Resource List" (sitemapindex + urlset)
> - terminology for grouping into index, proposals: paginate, split
> into manageable chunks
>
> 12) do we need/want to register our relation type to avoid the full
> URI in sections 10.3.2 and 10.3.3
> - if yes, can we do it with this spec
> - would also be a candidate to replace rel="top" which is not defined anyways
> - proposed: rel="resourcesync"
> - proposed: rel="service" as defined in RFC 5023
> http://tools.ietf.org/html/rfc5023
>
> 13) ResourceSync assertions vs. HTTP assertions
> - what to do when they don't match
> - e.g., modified attribute != Last-Modified, hash != Content-MD5, etc
> proposal (MLN): discuss and give guidance in section 3, add to table 3.2
>
> 14) should we acknowledge other sitemap extensions
> - expires
> - <changefreq>never</changefreq>
> - "never. Use this value for archived URLs."
> - http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668
>
> 15) next steps
> - writing/editing
> - public release date 01/31/2013
>
> Please feel free to add issues if you see fit.
>
> cheers
> Martin
>
> --
>
>

----
Michael L. Nelson m...@cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)

Martin Klein

unread,
Jan 22, 2013, 6:47:34 PM1/22/13
to resour...@googlegroups.com

Michael Nelson

unread,
Jan 23, 2013, 11:19:26 AM1/23/13
to Martin Klein, resour...@googlegroups.com

> 12) do we need/want to register our relation type to avoid the full
> URI in sections 10.3.2 and 10.3.3
> - if yes, can we do it with this spec
> - would also be a candidate to replace rel="top" which is not defined anyways
> - proposed: rel="resourcesync"
> - proposed: rel="service" as defined in RFC 5023
> http://tools.ietf.org/html/rfc5023

note: other candidates to consider include "contents" and "index"

they're defined in the HTML spec:

http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-links

but there is nothing to restrict them to HTML media types. in other
words, we could build a good argument that a RS capability list really is
a contents/index for "the capabilities that a Source provides".

> 14) should we acknowledge other sitemap extensions
> - expires
> - <changefreq>never</changefreq>
> - "never. Use this value for archived URLs."
> - http://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668

just for clarification, <expires> is an extension that we've discussed,
but <changefreq>never</changefreq> is part of the original sitemap syntax
and it appears that it is something that is actually used. see:

http://www.cnn.com/sitemaps/sitemap-articles-2012-09.xml

...
<url>
<loc>http://www.cnn.com/2012/09/30/world/americas/venezuela-elections/index.html</loc>
<lastmod>2012-10-01T02:39:54Z</lastmod>
<changefreq>never</changefreq>
<priority>0.5</priority>
</url>
...

fyi,

Michael

Martin Klein

unread,
Jan 23, 2013, 11:19:47 AM1/23/13
to resour...@googlegroups.com
fwd'ing to list


---------- Forwarded message ----------
From: Michael Nelson <m...@cs.odu.edu>
Date: Wed, Jan 23, 2013 at 8:50 AM
Subject: Re: [resourcesync] ResourceSync call tomorrow 12.30pm EST
To: Martin Klein <martink...@gmail.com>



> - proposed: rel="service" as defined in RFC 5023
> http://tools.ietf.org/html/rfc5023


interesting... "service" is in the registry, but it is not actually
defined in the RFC. the rfc *does* define a new mime type for service
documents:

application/atomsvc+xml

which does suggest some flexibility in rel="service":

rel="service" type="application/atomsvc+xml"
rel="service" type="foo/bar"

means they should not be confused b/c type should provide some
guidance as to which one a client should be following...

so we probably *could* reuse service, but technically we should
probably have a unique mime type with it. right now we're just
linking to a generic application/xml mime type via sitemaps.

also, the doc has a ref for:

[RFC 6415]
IETF RFC 6415: Web Host Metadata, E. Hammer-Lahav, B.Cook, October 2011.

but I don't see it used anywhere. I don't know if it will show up in
a future section or not. I'm thinking we only need RFC 5875 (already
cited).


regards,

Graham Klyne

unread,
Feb 1, 2013, 9:11:17 AM2/1/13
to Martin Klein, resour...@googlegroups.com
FWIW, I recently had an extensive conversation with Erik Wilde about a similar
issue w.r.t. provenance access services (but in the context of possibly using
RDF as the service document format).

A core element of the discussion was the use of content negotiation of
content-type as a switching point for different kinds of service interaction
rather than different link relations.

Cf.
PROV issue
- http://www.w3.org/2011/prov/track/issues/425
Discussion thread in W3C LDP archives
- http://lists.w3.org/Archives/Public/public-ldp/2012Dec/0003.html
to
- http://lists.w3.org/Archives/Public/public-ldp/2012Dec/0029.html

The upshot of all this, I think, is a possible argument on REST principles of
re-using the existing link relation (assuming the semantics match) and using a
new content-type to distinguish service details.

Roughly, the link relation should convey WHY (one might want to follow a link),
and the content type (HOW) to interpret what you find on doing so.

#g
--
Reply all
Reply to author
Forward
0 new messages