|PubSubHubbub 0.4 project||Julien||3/29/12 1:02 PM|
I hope you're doing well.
For the past couple months, me and several other people tried to identify
how we could make PubSubHubbub better, by fixing some of its issues,
but also opening the door to more use cases (private resources... etc).
It is based on a lot of experience that we have accumulated by hosting
some of the biggest hubs out there, but also conversations we've had
with publishers who sometimes didn't go down the PubSubHubbub way.
We came to the conclusion that there was no way we could make 0.4
downward compatible because 0.3 makes too many assumption on the
types of resources (Atom or RSS feeds).
Here is a project of how the spec could evolve. We have already got
a lot of feedback from people who implemented the previous spec
but also from people who want to implement it now that it solves some
of the issues.
Git repo (feel free to check out):
Human readable version at:
I would personally appreciate your constructive feedback.
|Re: [pubsubhubbub] PubSubHubbub 0.4 project||Jay R. [dlvr.it]||3/29/12 3:14 PM|
I see several problems, at least one of which I know has been discussed before.
* Discovery is self-conflicting. "HTTP response from the publisher MUST include at least one Link Header" [...] "In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub". Changing MUST to SHOULD would solve this, however I believe that alternate methods such as for HTML and feeds should be explicitly defined, not dictated to the vague "MAY use embedded link elements".
* Behavior should be defined for when http headers and body content conflict with each other by specifying differing self or hub links. Which one takes priority? I would imagine content wins, since it's closest to the source. (a la CSS - element-embedded styles always win out over classes or IDs)
Section 5.2.1: (See also Section 7)
* This should be either 201 Created, or 202 Accepted, not "any 200". These codes have meanings - let's use them. It makes no sense for a server to issue a "206 Partial Content" as a response for accepting a subscription, so it shouldn't be allowed.
* I'm trying to understand this section, because it's worded confusingly. This is a final handshake after verifying the subscription? The hub must immediately push the current state of the subscribed topic? The subscription can be denied by the hub after final verification occurs? If I'm understanding this, I'd prefer that the accept and denial were handled the same. "accepted" or "denied", not <push content> vs. "denied". I say this because I don't keep a "current topic state" in a retrievable location for the publishing system - a queue of outgoing requests is maintained by the backend when new updates arrive from publishers, and if that queue is empty, there's nothing to send.
* This is PubSubHubbub. Publish is not only part of the name, it's the point of the protocol. Publisher and Hub CAN agree on a separate publishing mechanism outside of the spec, but the spec MUST include a standard, universal mechanism that all hubs implement or absolutely no code can be written which applies the spec but is reusable on any hub. Ideally this would be a push notification of its own, but that adds additional security concerns between the hub and publisher, so a simple ping using the content URL (as in the 0.3 spec) is sufficient.
* I've tried to advocate against the "any 200 is success" idea since the beginning. In my opinion, a successful push MUST respond with a 202 Accepted. Anything else MUST be considered a failure, and continued failures should prompt a subscription verification attempt. Allowing for 200 responses means that a hub will continue to publish to an incorrectly configured site, (perhaps one which has relocated endpoints, or shutdown, or had their hostname expire, etc.) until such a time as subscription verification occurs which, if you use pushpress as an example, is ten years from now.
* "up to some reasonable maximum over a reasonable time period" needs better definition.
* "202 Accepted"
|Re: [pubsubhubbub] PubSubHubbub 0.4 project||Julien||3/30/12 10:44 AM|
Thanks for the feedback. I am sorry you previously discussed some of the issues before. I guess I have missed your comments.
Please see below for my responses.
On Fri, Mar 30, 2012 at 12:14 AM, Jay R. [dlvr.it] <jros...@dlvr.it> wrote:
I want 0.4 to be the reference implementation, so the MUST is probably here to stay. However, I think it's fair to tell subscribers that they may try 'older' versions of the spec if 0.4 doesn't seem to be implemented.
Since there is no way to make 0.4 downward compatible, let's at least make sure it's easy to build on rather than use a bazillions subcases.
The priority should always be to the latest version of the spec. If spec 0.4 is implemented, then that is what people should rely on. If none, then, they may try 0.3... etc.
Again, I want to make the spec simple and clear enough, without too many subcases. The spec should not include the expected behavior when the spec is not implemted (doh!)
Agreed. Let's change this to 202 Accepted.
Up until now, the spec assumed that the hub would always "validate" the subscription if they have been confirmed by the subscriber. This is not enough, as there are cases where the feed publisher may also decide to accept or refuse (asynchronously) the subscription. Additionnaly, there are times where a publisher may decide to "cancel" an existing subscription, which again, is not possible with the current approach.
I initially had the same idea, and I'm ok with that approach.
Please, let's just assume that existing implementation will need to be reworked no matter what.
We indeed have had this discussion several times.
Publishers and hub MUST agree on a publishing mechanism. The rest doesn't matter. Why should the spec rule on something that anyone can decide to change without affecting anyone else?
We can still list a bunch of mechanisms in an annex.
Ok with 202 instead of 200.
Please, provide one.
|Re: PubSubHubbub 0.4 project||Julien||4/3/12 1:52 AM|
For those who do not feel like reading the spec and comparing each of its points, here is a summary of the changes, and why
we think they're good!
|Re: PubSubHubbub 0.4 project||Andy Dennie||4/3/12 5:08 AM|
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||4/3/12 5:10 AM|
[Doh! Thanks Andie]
|Re: [pubsubhubbub] PubSubHubbub 0.4 project||elf Pavlik||4/3/12 5:25 PM|
|Re: [pubsubhubbub] PubSubHubbub 0.4 project||Julien||4/4/12 3:46 AM|
For as long as 0.4 is not adopted, 0.3 is the "main" one, so the discussionwill happen here and there.
When we can get W3C to take PubSubHubbub in its arms, then, the discussions should happen there (if most people agree).
|Re: PubSubHubbub 0.4 project||vrypan||4/6/12 3:24 AM|
Question: HTTP resources may be anything over HTTP, right? A plain text file, or an arbitrary XML, or even a media file like a video or a photo?
|Re: PubSubHubbub 0.4 project||vrypan||4/6/12 3:39 AM|
I like the way HTTP headers are used.
However, I would like to see an alternative place to store this kind of information other than the HTTP headers, because "advanced" HTTP header manipulation is out of reach for many users and in many cases: form complex CMSs, to simple hosting solutions like Amazon S3 (I tried to set a "Link:" header and it looks like s3 will not accept it).
I think that it would be useful if 4.Discovery described an alternative fallback mechanism based on some reasonable convention. For example, something like an extended sitemap.xml that includes rel=pub resources for each URL, and the client would look up if everything else fails.
It's not as elegant as the HTTP headers, but it should be much easier for many publishers to implement, IMHO. Would you consider something like this, or is it totally out of the philosophy pubsubhubbub is designed?
On Thursday, March 29, 2012 11:02:12 PM UTC+3, Julien wrote:
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||4/6/12 5:31 AM|
Thanks for the feedback! Indeed, HTTP resources can be anything... at least in theory, and I'm expecting some specific use cases to break that theory :)
I perfectly understand and agree that using HTTP headers is a bit of a constraint. However, as you've noted, if we want to be able to provide a protocol that "works" with any type of content, we cannot put these elements in the HTTP body.
Your suggestion about using a "well known" url is interesting and I like it. I'd love to have feedback from folks like Blaine or Evan on this, because this could very well be a viable alternative.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Justin Richer||4/6/12 5:37 AM|
I think that using additional discovery mechanisms are a good idea here, especially things that get the discovery info out of the body of the resource. Using either host-meta/XRD or SWD, you basically end up querying with a combination of a principle (in this case, the URL of the resource you're accessing), and the service that you're after. You construct a well-known URL on the right base with the right parameters using a deterministic method and come back with a discovery document that tells you where to go.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||vrypan||4/6/12 6:41 AM|
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/2/12 6:01 AM|
I just pushed an updated version of this draft.
It now includes the following (minor) points:
- Comments/Fixes by Walter Groix,
- Symetric "accepted" and "denied" when the hubs validates the subscription
- Use of From header (based on Blaine's feedback)
- Use of Location header when "denied" to indicate that another
- Allowing for discovery thru host-meta well known url.
Please, feel free to review and submit any change that is appropriate.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/2/12 6:02 AM|
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Blaine Cook||5/2/12 8:15 AM|
On 2 May 2012 14:01, Julien Genestoux <julien.g...@gmail.com> wrote:This sounds awesome, Julien!
– I'd suggest the following text for §4❡2 (major change is to use
"In the absence of HTTP Link headers, subscribers SHOULD fall back to
other methods to discover the hub(s) and the canonical URI of the
topic. If the topic is a Atom or RSS feed, the publisher MAY provide
embedded link elements as described in Appendix B of Web Linking
[RFC5988], and the subscriber SHOULD use those links as per Appendix B
of [RFC5988]. Similarly, for HTML pages, the publisher MAY use
embedded link elements as described in Appendix A of Web Linking
[RFC5988], and subscribers SHOULD use those links as per Appendix A of
[RFC5988]. Finally, publishers MAY also use Web Host Metadata
[RFC6415] to include the <Link> element with a rel-value equal to
"hub" as with the other approaches; publishers SHOULD provide the JSON
variant of [RFC6415]. Subscribers SHOULD support this method for
I started reading the spec at the link you posted and realised that
it's not the updated one at all. I don't know if the above comment
stands, but I'll hold off until the link is updated. :-)
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/3/12 10:43 AM|
Not sure what you mean, but I had previously updated the file there : http://superfeedr-misc.s3.amazonaws.com/pubsubhubbub-core-0.4.html
Feel free to send your changes here, and I'll make sure I reflect them there :)
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Alkis Evlogimenos||5/30/12 1:12 AM|
In section 4 (discovery), I believe it is best to avoid explicit mentioning of Atom or RSS and instead mention them indirectly as XML formats. The way the section is currently worded, suggests that sitemaps (for example) can only be discovered through link headers, but this is problematic since link headers are not always an option for webmasters.
In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub(s) and the canonical URI of the topic. If the topic is a Atom or RSS feed, it MAY use embedded link elements as described in Appendix B of Web Linking [RFC5988]. Similarly, for HTML pages, it MAY use embedded link elements as described in Appendix A of Web Linking [RFC5988]. Finally, publishers MAY also use the Well-Known Uniform Resource Identifiers [RFC5785] .host-meta to include the <Link> element.
In the absence of HTTP Link headers, subscribers MAY fall back to other methods to discover the hub(s) and the canonical URI of the topic. If the topic is an XML based feed, it MAY use embedded link elements as described in Appendix B of Web Linking [RFC5988]. Similarly, for HTML pages, it MAY use embedded link elements as described in Appendix A of Web Linking [RFC5988]. Finally, publishers MAY also use the Well-Known Uniform Resource Identifiers [RFC5785] .host-meta to include the <Link> element.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/30/12 1:29 AM|
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/30/12 2:08 AM|
I think it's now important to get more feedback from these services who implemented their own flavor of PubSubHubbub in the past because the previous spec didn't cover their needs.
In my list, I have:
- Facebook (not sure who's in charge now, David Recordon can help?)
- Instagram (Emailed Shayne Sweeney)
- Github (Emailed Rick Olson)
- Flickr (Not sure who is in charge)
- Diaspora (not sure who is in charge...)
I am sure there are many others. Can you help at finding them? Also, if you know the people in charge at these companies, feel free to get them to participate to that thread. I have already added the folks I know.
Thanks for your precious help... we will soon get there :)
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||5/30/12 2:15 AM|
I forgot to add Wordpress to that list.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Alkis Evlogimenos||6/1/12 8:14 AM|
After a more careful read here's some more comments. Changes are in red with rationale indented below:
5.1: Hubs MUST allow subscribers to re-request subscriptions that are already active.
5.1.1: The callback URL MAY contain arbitrary query string parameters (e.g., ?foo=bar&red=fish). Hubs MUST preserve the query string during subscription verification by adding new parameters. Existing parameters with names that overlap with those used by verification requests will not be overwritten, and added parameters must appear after the existing ones. For event notification, the callback URL will be POSTed to including any query-string parameters in the URL portion of the request, not as POST body parameters.
The difference here is that the order of parameters can change if it does not matter, preserving the semantics. Example: original query string ?k1=a1&k2=b1. Hub wants to add k1=a2 and k3=c1. With previous wording these two query strings are valid: ?k1=a1&k2=b1&k1=a2&k3=c1 and ?k1=a1&k2=b1&k3=c1&k1=a2. With the new wording, any query string where k1=a1 comes before k1=a2 is valid. For example this is now allowed: ?k1=a1&k1=a2&k2=b1&k3=c1. This provides flexibility in implementation without any loss of information. The only important thing here is that k1=a1 comes before k1=a2 which might have a meaning for the receiver so it is preserved.
5.2: The hub verifies a subscription request by sending an HTTP [RFC2616] GET request to the subscriber's callback URL as given in the subscription request. This request has the following query string arguments added (format described in Section 17.13.4 of [W3C.REC‑html401‑19991224]):
7: The hub MAY reduce the payload to a diff between two consecutive versions if its format allows it.
The POST request must include the Self Link Header and a Hub Link Header that correspond to topic's url and hub url, respectively.
Permanent vs Non-Permanent subscriptions. I think the behavior specified is ok, but the way it is specified is rather confusing. Let's see how these two subscriptions "differ":
- permanent subscription: no hub.lease_seconds is specified. But the hub responds with hub.lease_seconds which represents how many seconds should elapse before it attempts to refresh the subscription.
- non-permanent subscription: hub.lease_seconds is specified. The hub responds with the hub.lease_seconds which represents how many seconds should elapse before it attempts to refresh the subscription.
Woot? Notice the last sentence for each of the two items is the *same*. The only difference between permanent vs non-permanent subscription is that the subscriber does or does not provide a *hint* for hub.lease_seconds. So why don't we stop the distinction between permanent vs non-permanent subscriptions altogether? Notice the hub MAY choose to respect hub.lease_seconds in the subscription request or not (5.1). Removing this distinction from the spec will simplify it.
I think we want to specify this a little bit more.
Google Switzerland GmbH
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||6/4/12 1:19 AM|
This is brilliant. Thanks for the precious help. Please see the comments below!
On Fri, Jun 1, 2012 at 5:14 PM, Alkis Evlogimenos (Άλκης Ευλογημένος) <al...@google.com> wrote:
I am not sure this is the right thing to do. In other words, I think it would add a lot of confusion of params with the same name appeared in the url because the subscriber may not know which ones were added by the hub, and which one were already present in the url...
However, I undrstand your point, but I'd like to say that I believe this is solved more elegantly by using query param names that should not have collisions... the hub adds query strings with a "hub." namespace. I believe this should be sufficient.
I'd love to have a third opinion though.
Well... same :)
I definetely agree with that.. and it's fixed!
Hum. That was actually my understanding... However, I have to confess that these "permanent" subscriptions are quite a pain to deal with... I am now in favor of reducing the default lease duration on our hubs even.
As a matter of facts, we found out that some subscribers will accept of verifications of intent... which means that the hub may keep stale urls forever in their data store. I believe this is a problem on the long term.
By removing the concept of "permanent" url, we force subsribers to care about the subscriptions they make and want to hold, so I'm in favor of that, and i have pushed a commit (5eae049) in that direction.
Please let me know what you think!
Agreed! Thanks for helping out!
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Alkis Evlogimenos||6/6/12 3:31 AM|
Thank you for the fast replies/changes. I put 2 pull requests on github.
hub.lease_seconds becomes required in the response of the hub. Because we made all subscriptions "temporary/ephemeral" IMO it makes sense to require the hub to return hub.lease_seconds so that subscribers know when to renew their subscription.
Removes a mention of temporary subscription. Since all subscriptions are temporary now, it makes no sense to mention the subscription as temporary in this context.
|Re: [pubsubhubbub] Re: PubSubHubbub 0.4 project||Julien||6/7/12 5:14 PM|
Sorry for the delay; I was traveling to SF and it takes forever from Europe!
I merged the changes... thanks again!