--
Jay Rossiter | Jack of All Trades
j...@dlvr.it | Phone: 503.896.6187 | Fax: 503.235.2216
Website: dlvr.it | RSS: blog.dlvr.it | Support: support.dlvr.it
I see several problems, at least one of which I know has been discussed before.
Section 4:
* Discovery is self-conflicting. "HTTP response from the publisher MUST include at least one Link Header" [...] "In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub". Changing MUST to SHOULD would solve this, however I believe that alternate methods such as for HTML and feeds should be explicitly defined, not dictated to the vague "MAY use embedded link elements".
* Behavior should be defined for when http headers and body content conflict with each other by specifying differing self or hub links. Which one takes priority? I would imagine content wins, since it's closest to the source. (a la CSS - element-embedded styles always win out over classes or IDs)
Section 5.2.1: (See also Section 7)
* This should be either 201 Created, or 202 Accepted, not "any 200". These codes have meanings - let's use them. It makes no sense for a server to issue a "206 Partial Content" as a response for accepting a subscription, so it shouldn't be allowed.
Section 5.2.2:
* I'm trying to understand this section, because it's worded confusingly. This is a final handshake after verifying the subscription?
The hub must immediately push the current state of the subscribed topic? The subscription can be denied by the hub after final verification occurs? If I'm understanding this, I'd prefer that the accept and denial were handled the same. "accepted" or "denied", not <push content> vs. "denied".
I say this because I don't keep a "current topic state" in a retrievable location for the publishing system - a queue of outgoing requests is maintained by the backend when new updates arrive from publishers, and if that queue is empty, there's nothing to send.
Section 6:
* This is PubSubHubbub. Publish is not only part of the name, it's the point of the protocol.
Publisher and Hub CAN agree on a separate publishing mechanism outside of the spec, but the spec MUST include a standard, universal mechanism that all hubs implement or absolutely no code can be written which applies the spec but is reusable on any hub. Ideally this would be a push notification of its own, but that adds additional security concerns between the hub and publisher, so a simple ping using the content URL (as in the 0.3 spec) is sufficient.
Section 7:
* I've tried to advocate against the "any 200 is success" idea since the beginning. In my opinion, a successful push MUST respond with a 202 Accepted. Anything else MUST be considered a failure, and continued failures should prompt a subscription verification attempt. Allowing for 200 responses means that a hub will continue to publish to an incorrectly configured site, (perhaps one which has relocated endpoints, or shutdown, or had their hostname expire, etc.) until such a time as subscription verification occurs which, if you use pushpress as an example, is ten years from now.
* "up to some reasonable maximum over a reasonable time period" needs better definition.
Section 8:
* "202 Accepted"
small question, do you plan to continue using this mailing list or one from:
http://www.w3.org/community/pubsub/
cheers!
~ elf pavlik ~
In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub(s) and the canonical URI of the topic. If the topic is a Atom or RSS feed, it MAY use embedded link elements as described in Appendix B of Web Linking [RFC5988]. Similarly, for HTML pages, it MAY use embedded link elements as described in Appendix A of Web Linking [RFC5988]. Finally, publishers MAY also use the Well-Known Uniform Resource Identifiers [RFC5785] .host-meta to include the <Link> element.
In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub(s) and the canonical URI of the topic. If the topic is an XML based feed, it MAY use embedded link elements as described in Appendix B of Web Linking [RFC5988]. Similarly, for HTML pages, it MAY use embedded link elements as described in Appendix A of Web Linking [RFC5988]. Finally, publishers MAY also use the Well-Known Uniform Resource Identifiers [RFC5785] .host-meta to include the <Link> element.
Was activate. Should be either active or activated. I prefer the former.
The difference here is that the order of parameters can change if it does not matter, preserving the semantics. Example: original query string ?k1=a1&k2=b1. Hub wants to add k1=a2 and k3=c1. With previous wording these two query strings are valid: ?k1=a1&k2=b1&k1=a2&k3=c1 and ?k1=a1&k2=b1&k3=c1&k1=a2. With the new wording, any query string where k1=a1 comes before k1=a2 is valid. For example this is now allowed: ?k1=a1&k1=a2&k2=b1&k3=c1. This provides flexibility in implementation without any loss of information. The only important thing here is that k1=a1 comes before k1=a2 which might have a meaning for the receiver so it is preserved.
See above comments for 5.1.1.
It is important to specify that this diffing happens between two consecutive versions and the hub will not keep infinite state to implement this diffing. Let me explain with an example. Publisher pings the hub. Feed contains items AB. The hub publishes AB to subscribers. Publisher pings the hub again. Feed contains items BC. Hub publishes C to subscribers because B was already published before. Publisher pings the hub again. Feed contains items AD. Publisher SHOULD publish AD to subscribers and not just D. There are several advantages to AD vs D. Publishing AD means that subscribers which subscribed after the ABC publish will see A now. It also means that the state the hub needs to keep to do the diffing is much smaller than keeping all the state since the beginning of time (think of a youtube firehose feed for example). Plus I can't think a usecase where publishing D is better than publishing AD. That said, usually feeds on pubsubhubbub model a stream of updates and feeds are a sliding window of this stream, so I do not expect this difference in behavior to be visible to subscribers for these kind of feeds.
Previous wording is confusing.
After a more careful read here's some more comments. Changes are in red with rationale indented below:5.1: Hubs MUST allow subscribers to re-request subscriptions that are already active.Was activate. Should be either active or activated. I prefer the former.
5.1.1: The callback URL MAY contain arbitrary query string parameters (e.g., ?foo=bar&red=fish). Hubs MUST preserve the query string during subscription verification by adding new parameters. Existing parameters with names that overlap with those used by verification requests will not be overwritten, and added parameters must appear after the existing ones. For event notification, the callback URL will be POSTed to including any query-string parameters in the URL portion of the request, not as POST body parameters.The difference here is that the order of parameters can change if it does not matter, preserving the semantics. Example: original query string ?k1=a1&k2=b1. Hub wants to add k1=a2 and k3=c1. With previous wording these two query strings are valid: ?k1=a1&k2=b1&k1=a2&k3=c1 and ?k1=a1&k2=b1&k3=c1&k1=a2. With the new wording, any query string where k1=a1 comes before k1=a2 is valid. For example this is now allowed: ?k1=a1&k1=a2&k2=b1&k3=c1. This provides flexibility in implementation without any loss of information. The only important thing here is that k1=a1 comes before k1=a2 which might have a meaning for the receiver so it is preserved.
5.2: The hub verifies a subscription request by sending an HTTP [RFC2616] GET request to the subscriber's callback URL as given in the subscription request. This request has the following query string arguments added (format described in Section 17.13.4 of [W3C.REC‑html401‑19991224]):See above comments for 5.1.1.
7: The hub MAY reduce the payload to a diff between two consecutive versions if its format allows it.It is important to specify that this diffing happens between two consecutive versions and the hub will not keep infinite state to implement this diffing. Let me explain with an example. Publisher pings the hub. Feed contains items AB. The hub publishes AB to subscribers. Publisher pings the hub again. Feed contains items BC. Hub publishes C to subscribers because B was already published before. Publisher pings the hub again. Feed contains items AD. Publisher SHOULD publish AD to subscribers and not just D. There are several advantages to AD vs D. Publishing AD means that subscribers which subscribed after the ABC publish will see A now. It also means that the state the hub needs to keep to do the diffing is much smaller than keeping all the state since the beginning of time (think of a youtube firehose feed for example). Plus I can't think a usecase where publishing D is better than publishing AD. That said, usually feeds on pubsubhubbub model a stream of updates and feeds are a sliding window of this stream, so I do not expect this difference in behavior to be visible to subscribers for these kind of feeds.
The POST request must include the Self Link Header and a Hub Link Header that correspond to topic's url and hub url, respectively.Previous wording is confusing.
Additional comments:Permanent vs Non-Permanent subscriptions. I think the behavior specified is ok, but the way it is specified is rather confusing. Let's see how these two subscriptions "differ":- permanent subscription: no hub.lease_seconds is specified. But the hub responds with hub.lease_seconds which represents how many seconds should elapse before it attempts to refresh the subscription.- non-permanent subscription: hub.lease_seconds is specified. The hub responds with the hub.lease_seconds which represents how many seconds should elapse before it attempts to refresh the subscription.Woot? Notice the last sentence for each of the two items is the *same*. The only difference between permanent vs non-permanent subscription is that the subscriber does or does not provide a *hint* for hub.lease_seconds. So why don't we stop the distinction between permanent vs non-permanent subscriptions altogether? Notice the hub MAY choose to respect hub.lease_seconds in the subscription request or not (5.1). Removing this distinction from the spec will simplify it.
I think we want to specify this a little bit more.