Today I wrote up a first draft for a variation on PubSubHubbub for
JSON-based applications:
http://martin.atkins.me.uk/specs/pubsubhubbub-json
As noted in the specification document, the goal is to make the hub as
dumb as possible, having the payloads just be opaque JSON objects as far
as the hub is concerned. This means that this can be used for multiple
JSON-based applications without changing the hub.
The near-term goal is to allow the fledgling "Activity Streams API" spec
to use this as a transport, but I also want to make it useful for
delivering other content and for proprietary payloads like those used in
Facebook's real-time notifications API.
It'd be nice if this format could be supported alongside Atom and RSS in
the App Engine hub reference implementation, though I've not yet dug
deeply into the code to figure out how hard that might be.
So I guess this would just monitor an entire resource and post the
updated version to the subscriber whenever it changes, rather than
dividing the resource into logical "items" and delivering them separately?
That seems like a fine fallback for non-stream-like resources like
images but I do think it's useful to define a more granular stream
format for JSON so that it can be used to deliver things like activity
streams more efficiently.
* It would be nice if there was a Content-Type required (or at least
encouraged) for topics, so that hubs wouldn't have to infer the
content type from the content. This would make it more efficient for a
single hub to support both Atom/RSS and JSON streams.
* What's the relation between the Discovery Document and the
Notification Message? It seems like it's two different names for what
amounts to the same format, that's just used differently between the
publisher->hub and the hub->subscriber.
* In section 5, it says "Implementations MUST ignore properties that
they do not understand." Do you mean ignore and drop these properties,
or ignore and preserve? I feel like it could be read either way.
* Section 6: "This section defines a message format that can be used
by the publisher..." <- should that be subscriber, or am I misreading
this entirely?
* Section 7 doesn't feel right to me. What you want is an equivalence
comparison for JSON objects; what you're describing is a specific
optimized algorithm for one. This is just an implementation detail for
the hub; it doesn't need to be part of the spec.
* What are the pros and cons of this proposal, versus wrapping JSON
objects in an Atom feed?
--Ravi
Agreed.
In light of Monica's proposal for generic content-types, it seems that
the specific message format proposed in my spec should have its own
specialized MIME type so that it doesn't overlap with the use of the
general type application/json, which might be used for JSON documents
that do not conform to the specific structure I called out in the
specification.
Since this specification defines two differing JSON object structures it
should probably define a MIME type for each.
I imagine then that there would be several MIME types which are
considered "special" to a hubbub hub, where piecemeal delivery is
possible, and then any others would be treated as opaque blobs as
described in Monica's proposal.
> * What's the relation between the Discovery Document and the
> Notification Message? It seems like it's two different names for what
> amounts to the same format, that's just used differently between the
> publisher->hub and the hub->subscriber.
They started off as the same thing but with the former having a "hub"
property. However, when I came to accommodate combining notifications
for multiple topics into a single notification I added an additional
level of indirection (the "notification packet") to have somewhere to
put the topic URL.
So a discovery document looks like this:
{
"hubs": [ "https://example.com/hub" ],
"items": [
{
"arbitraryPayload": 1
}
]
}
while the notification message looks like this:
{
"items": [
{
"topic": "http://example.com/topic",
"payload": {
"arbitraryPayload": 1
},
}
]
}
This minor difference is slightly annoying, but it seemed more desirable
to me that requiring the discovery document to redundantly re-state the
topic URL for each payload.
> * In section 5, it says "Implementations MUST ignore properties that
> they do not understand." Do you mean ignore and drop these properties,
> or ignore and preserve? I feel like it could be read either way.
I suppose I mean "drop"; I'm not sure what it would mean to "preserve"
in this context, since the hub is not ever required to reproduce the
root object of either document, so it is not apparent externally whether
the hub dropped or preserved the properties.
> * Section 6: "This section defines a message format that can be used
> by the publisher..."<- should that be subscriber, or am I misreading
> this entirely?
Yes, I believe I meant to say "subscriber" here, or alternatively I may
have originally completed the sentence as "convey the locations". Either
way it doesn't make sense as currently written. Thanks.
> * Section 7 doesn't feel right to me. What you want is an equivalence
> comparison for JSON objects; what you're describing is a specific
> optimized algorithm for one. This is just an implementation detail for
> the hub; it doesn't need to be part of the spec.
>
Is there a standard comparison I could reference? I want publishers to
be able to rely on a particular definition of "the same" vs. "different"
so that they don't get differing behavior for the same documents between
hubs.
Accidental differences I was anticipating were:
* Failure to handle non-ASCII characters correctly due to differing
encodings between the stored representation and the new representation.
* Some languages failing to distinguish between 1 and true, 0 and
false, "" and false, etc.
* Failures when the publisher changes whitespace usage and the hub
failed to normalize whitespace.
> * What are the pros and cons of this proposal, versus wrapping JSON
> objects in an Atom feed?
>
I would consider it a failure if an implementation of this protocol had
to have both an Atom parser and a JSON parser around in order to
operate. The goal is a natural representation of JSON-native data;
encapsulating JSON into XML is not "natural".
Of course, the con is that hubs must have support for this additional
format, but since PubSubHubbub assumes a coupling between publisher and
hub this is not a big deal since the publisher will use a hub that
supports the serialization formats it uses.
Thanks for the feedback! Hopefully I'll be able to produce another draft
later this week incorporating all of this.
Here's a strawman:
If the hub is published in a Link: header, then the hub handles
whole-document updates agnostic to the type of the body.
Otherwise, the hub is declared inside the entity body in a
format-specific way. In this case, the resource format and the
notification format are defined by whatever specification defined how to
find the hub URL.
This is consistent with the capabilities we'd expect consumers of this
information to have anyway; you can't parse the atom:link in Atom/RSS or
the "hubs" property in my JSON proposal without having support for those
specific payload formats, but that's okay because you wouldn't have been
able to parse the update notifications anyway.
Generally I agree with this approach. The needs of a self-describing
document/payload are different than those of a document/payload that
requires the headers for correct interpretation. The same applies to
HTML and hAtom, where the hub link would be in the html <head> and the
headers are mostly irrelevant. This also has an effect on security,
where the generic HTTP version must preserve the fidelity of the
payload's headers, whereas the self-describing document can drop them.
To be fully general, the notification payload could be an HTTP response
encapsulated in an HTTP request. That way the subscriber will get all of
the HTTP headers exactly as returned by the topic URL, and most
importantly the subscriber will get the HTTP status code which will
allow the subscriber to detect, for example, a 410 Gone response
indicating that the resource has been permanently removed.
The challenge with this approach, of course, is that some HTTP
frameworks may not provide a convenient way to parse an arbitrary
payload as an HTTP request, and it complicates the common case of just
getting notifications of changes to the content.
Given that the hubbub architecture has a coupling between the hub and
publisher, it's left to the publisher to choose a hub that suits the
publisher's needs.
I think the baseline generic implementation such as on app engine should
aspire to be as generic as possible so that it doesn't need to be
altered for each new application requirement, and this is what the
specification should describe. You can't really be "smart" and generic
at the same time.
However, a publisher is free to choose a hub that provides additional
value-add services such as normalization, if the normalization that the
hub does makes sense for the payload the publisher offers.
The PubSubHubbub protocol has two "sides": publisher to hub, and hub to
subscriber.
There's no reason why you can't just implement one "side" of this, and
do something custom on the other side, if you have unique needs that the
baseline protocol doesn't provide for. Several implementations already
use subsets of hubbub in some situations, and that's fine.
I'm not sure what you mean by it being "less RESTful" to have the hub
treat the payload as opaque.
In my JSON PubSubHubbub draft I used the term "stream resource" to
describe the more general idea of a resource that consists of a bunch of
items which each have distinct lifetimes, in an attempt to express a
superset containing the idea of a feed of the kind you'd expect to load
into a "feed reader" for human consumption as well as more
machine-oriented data.
However, I'm by no means wedded to the idea.
Perhaps it would be better to make a distinction on type of subscription
rather than type of resource.
All resources support a "whole-resource subscription" which re-delivers
the entire resource each time it changes.
Some resources support "stream-based subscription" which delivers new
items and changes to existing items in a particular media type.
The Link header defines a resource's hun for whole-resource
subscription, but a hub for a stream-based subscription is determined
via a mechanism specific to the stream media type.
I don't see any reason why someone would use a whole-resource
subscription for an Atom feed but the ability to express it comes from
the fact that a Link header indicates a hub that accepts whole-resource
subscriptions, and a Link header can be published on any resource.
Really I was just trying to be clear about the meaning of this rather
than suggesting that you would actually do it in practice.
The other alternative would be to say that you are simply not allowed to
publish a Link header if your resource is a stream/feed resource, but I
don't think that restriction would really buy us much.
Existing practice for exporting blogs is often a whole feed, but I agree it isn't a common case; on the other hand with something like a contacts list completeness is important.
There is an assumption with PUSH at the moment that you can always fetch the original feed. With Monica's addition of the REST verbs, this gets a little more tricky.
On Jun 15, 2010 8:16 PM, "Martin Atkins" <ma...@degeneration.co.uk> wrote:
On 06/15/2010 06:49 PM, Monica Keller wrote:
>
>
> Do we think that people will ever want "whole" re...
--