Happy to hear from you on this list!
This is an important question and something we need to get in the
spec. We've got an action item to have a better story for OAuth
support (http://code.google.com/p/pubsubhubbub/issues/detail?id=16).
I've been meaning to take a crack at writing down the design, so
thanks for the nudge. Hopefully we can all iterate on this a bit.
---
First, an obvious statement: Private feeds are hard. The
challenges/questions I see, with some answers I'd give for some of
them.
* What system should we use for authenticating consumers of private
feeds (e.g., OpenID, OAuth)?
- We shouldn't be bound to a single system of
authentication/authorization, even if it's OAuth. This should be left
to the Publisher to decide and should be outside the bounds of the
spec.
* Publishers must be able to disable the access of an individual
subscriber without affecting any others.
* Authenticated feeds must benefit from the fan-out speed-up a Hub
provides. If the publisher needs N separate feeds to serve N separate
consumers, the Hub isn't useful for reducing publisher load.
* Must publishers trust Hubs to properly enforce access controls?
- Yes. Publishers already have to trust hubs to not mess up their
content, so this trust relationship is implicit with public feeds
already. The same is true for any aggregation service.
* Should publishers declare their feeds as private ahead of time
(i.e., announce them to the Hub)?
- No. Otherwise, Hubs will be full of announced feeds that nobody
actually subscribes to.
* Should Publishers create a trust relationship with a Hub ahead of time?
- No. Again, this requires the Hub to maintain state that may never be used.
* Auth should be as simple as possible for the publisher and
subscriber; complexity in the hub.
---
Here is my proposal for adding private feeds to PubSubHubbub. The goal
is to keep the existing spec as simple as possible.
* Subscription flow
0. (out-of-band, out-of-spec) Subscriber tells the Publisher that
they want access for feed P to be published to subscription handler S.
This may require authentication steps-- all of that is outside the
scope of the protocol. OAuth could be a good choice.
1. Subscriber requests a subscription from the Hub using a POST
request (like in the existing spec), but also indicates that the feed
is a private feed that must be authenticated.
2. Hub contacts the publisher to request verification that the
subscriber URL, S, is authorized to receive feed P.
3. Publisher contacts the Hub and verifies that the subscription is
valid. Publisher gives the Hub an authorization key (opaque) to use
for *all* requests for this feed in the future, regardless of the
subscriber it's on behalf of.
4. Hub contacts the subscriber to verify the subscription request
(just like the spec)
(All of these steps may happen synchronously or asynchronously at any
part, depending on the normal "hub.verify" parameter)
* Unsubscribe flow
-- Works just like subscribe flow, except the mode is unsubscribe, not
subscribe. The purpose of this is to let the Publisher know which
subscribers the Hub will deliver a feed to, so it can keep its access
lists up to date.
* Publish flow
1. Publisher has new content on feed P, pings the Hub.
2. Hub contacts the Publisher to pull feed (over SSL), supplying the
opaque authorization key for the Hub in the Authorization header (HTTP
basic auth, I figure?)
3. Publisher serves the feed to the Hub.
4. Hub serves the feed to all subscribers that have been authorized by
the Publisher.
* Force disable access flow
1. Publisher contacts the Hub to force a subscriber to no longer receive a feed.
2. Hub contacts the Publisher to verify the intent.
3. Publisher returns that the subscriber should in fact lose their access.
---
Some notes:
* I rely on the HTTP channels to be encrypted/authenticated with SSL.
I've heard some concerns about this dependence, but I believe for
server-to-server communication it is acceptable to require either 1) a
unique IP for SSL auth, or 2) TLS, which supports virtual hosting via
SNI (http://en.wikipedia.org/wiki/Server_Name_Indication).
* The Publisher can decide to trust subscriber URLs using whatever
heuristic it wants. If they're okay with approving unencrypted pushes,
then you'll take http:// or https:// URLs.
* The Subscriber can authenticate the Push message coming from the Hub
if the Hub includes the original verify token in the event delivery.
Perhaps this again could reuse the Authentication header with HTTP
basic auth. Again, this would rely on an SSL channel.
* We'll need a new, discoverable endpoint for Publisher authorization
that the Hub can use for delivery. I think we should use the 401
response and WWW-Authenticate header (like in basic auth) to do this.
When the Subscriber requests a private feed, we will try to access
that feed in an unencrypted manner, causing the Publisher to serve a
401. The WWW-Authenticate header would include a single challenge,
which would be the URL of the publisher endpoint to use for
authorization.
* I keep going back to HTTP Basic Auth here because it's by far the
simplest thing for people to implement. Servers and Clients everywhere
already can serve data using this interface. We should make private
feeds easy to boot strap for people running Apache on flat files with
only .htaccess available to do authorization.
---
Hope that all makes sense! Didn't realize how long that would get
until I wrote it all up. What do you think? Anyone else want to chime
in? I'll probably move this to the Wiki if people think it sounds
reasonable. Then we can start considering how it should fit into the
spec document (or if it should be in a separate spec).
Let me know if you have any questions!
-Brett
On Sat, May 30, 2009 at 10:35 PM, Josh Fraser <josh...@gmail.com> wrote:
>
> Glad I grabbed a cup of tea before sitting down to read this one.
> This is definitely not an easy problem to solve. I need to think
> about this some more, but at first read it seems pretty good.
Cool. Thanks for taking the time to read through it. I hope the tea
was caffeinated. =P
> My first instinct on this was for authenticated feeds to merely send
> an notification that something changed, not the actual private content
> of the feed. The subscriber could then fetch the private feed
> directly from the publisher using whichever authentication method the
> publisher already has established. This approach would be simpler to
> implement, but it's incomplete since in some rare cases the fact that
> something was modified could be considered sensitive. Would we want
> to give this as an simpler alternative for publishers who don't care
> about their change data being public?
This is similar to the approach that SUP takes. People polling the SUP
stream can still figure out the relative frequency of feed updates
since that information is not secure and leaked. SUP has some level of
obscurity to hide the feed URLs so you know which URL was updated; but
there are a variety of attacks you could make on this and awful
consequences if the mapping of IDs was ever determined.
I think it's important for the feed update times to be private.
Otherwise you have the kind of honeypot that causes the EFF to freak
out. An extreme case: The police see that you updated your feed N
times during a certain time period-- your protection against
self-incrimination is gone.
Otherwise, I'd like the Hub's one-to-many fan-out to continue to
benefit the publisher in the case of an authenticated feeds. If
subscribers have to go back and poll the publisher again, we can no
longer reduce publisher load with the Hub.
I also like this scheme because it's so consistent for subscribers.
The subscription flow is exactly the same. The subscriber will receive
events in the same way for public and private feeds. The only addition
is an authorization endpoint for the Publisher, which can be absurdly
simple.
The rest of authorization and authentication is out of scope. Perhaps
the nascent email-to-URL translation scheme
(http://brad.livejournal.com/2357444.html) could be used for this.
> In general, I'm a fan of picking the best approach and removing
> options, so feel free to say no. :)
I agree with this approach. The goal of the base spec here is to have
minimal functionality that everyone can use to get stuff done.
Everything beyond that is an optional extension. However, it seems to
me like authenticated feeds should be included in the base spec (in
some form) since it's such a great use-case that we'd like to solve.
-Brett