PubSubHubbub for JSON

Martin Atkins

unread,

Jun 5, 2010, 3:33:32 PM6/5/10

to pubsub...@googlegroups.com

Hi all,

Today I wrote up a first draft for a variation on PubSubHubbub for
JSON-based applications:

http://martin.atkins.me.uk/specs/pubsubhubbub-json

As noted in the specification document, the goal is to make the hub as
dumb as possible, having the payloads just be opaque JSON objects as far
as the hub is concerned. This means that this can be used for multiple
JSON-based applications without changing the hub.

The near-term goal is to allow the fledgling "Activity Streams API" spec
to use this as a transport, but I also want to make it useful for
delivering other content and for proprietary payloads like those used in
Facebook's real-time notifications API.

It'd be nice if this format could be supported alongside Atom and RSS in
the App Engine hub reference implementation, though I've not yet dug
deeply into the code to figure out how hard that might be.

Matthew Terenzio

unread,

Jun 5, 2010, 5:34:27 PM6/5/10

to pubsub...@googlegroups.com

Very cool. I didn't read it word for word yet, but I like the direction.

Monica Keller

unread,

Jun 5, 2010, 5:47:20 PM6/5/10

to pubsub...@googlegroups.com

Cool !
Bret and I met on Friday and agreed on the minimal changes needed to support arbitrary formats in PubSubHubbub

I will post what we discussed on the wiki in detail but its pretty straightforward just using HTTP Headers

Web Linking for hub and topic url and Date header for update date.

Alexis Richardson

unread,

Jun 5, 2010, 5:51:38 PM6/5/10

to pubsub...@googlegroups.com

Excellent! We like arbitrary formats...

Julien Genestoux

unread,

Jun 5, 2010, 5:54:28 PM6/5/10

to pubsub...@googlegroups.com

Same here! Glad to see this coming :)

Martin Atkins

unread,

Jun 5, 2010, 8:58:30 PM6/5/10

to pubsub...@googlegroups.com

On 06/05/2010 02:47 PM, Monica Keller wrote:
> Cool !
> Bret and I met on Friday and agreed on the minimal changes needed to
> support arbitrary formats in PubSubHubbub
>
> I will post what we discussed on the wiki in detail but its pretty
> straightforward just using HTTP Headers
>
> Web Linking for hub and topic url and Date header for update date.
>

So I guess this would just monitor an entire resource and post the
updated version to the subscriber whenever it changes, rather than
dividing the resource into logical "items" and delivering them separately?

That seems like a fine fallback for non-stream-like resources like
images but I do think it's useful to define a more granular stream
format for JSON so that it can be used to deliver things like activity
streams more efficiently.

Ravi Pinjala

unread,

Jun 6, 2010, 1:27:26 PM6/6/10

to pubsub...@googlegroups.com

This looks interesting! Here are some thoughts I had while reading through it:

* It would be nice if there was a Content-Type required (or at least
encouraged) for topics, so that hubs wouldn't have to infer the
content type from the content. This would make it more efficient for a
single hub to support both Atom/RSS and JSON streams.

* What's the relation between the Discovery Document and the
Notification Message? It seems like it's two different names for what
amounts to the same format, that's just used differently between the
publisher->hub and the hub->subscriber.

* In section 5, it says "Implementations MUST ignore properties that
they do not understand." Do you mean ignore and drop these properties,
or ignore and preserve? I feel like it could be read either way.

* Section 6: "This section defines a message format that can be used
by the publisher..." <- should that be subscriber, or am I misreading
this entirely?

* Section 7 doesn't feel right to me. What you want is an equivalence
comparison for JSON objects; what you're describing is a specific
optimized algorithm for one. This is just an implementation detail for
the hub; it doesn't need to be part of the spec.

* What are the pros and cons of this proposal, versus wrapping JSON
objects in an Atom feed?

--Ravi

Monica Keller

unread,

Jun 6, 2010, 2:55:33 PM6/6/10

to Pubsubhubbub

This is what I am proposing

If you want to subscribe to a resource look at the response headers.
First look for rel="hub" using web linking as described
http://code.google.com/p/pubsubhubbub/wiki/ArbitraryContentTypes
If not present look at the content-type and map the hub location. We
currently know how to map application/atom+xml and application/rss+xml

We need a content type for JSON activity streams or JSON feeds.
Something more specific than
application/json

Ex: application/feed+json then we can describe having a "hubs" element
or whatever we decide.

I think using http headers will be quite useful as we can syndicate
csv, excel,jpgs etc

Take a look at the wiki page if you have a chance and let me know what
you think.

Martin Atkins

unread,

Jun 6, 2010, 3:34:03 PM6/6/10

to pubsub...@googlegroups.com

On 06/06/2010 10:27 AM, Ravi Pinjala wrote:
> This looks interesting! Here are some thoughts I had while reading through it:
>
> * It would be nice if there was a Content-Type required (or at least
> encouraged) for topics, so that hubs wouldn't have to infer the
> content type from the content. This would make it more efficient for a
> single hub to support both Atom/RSS and JSON streams.
>

Agreed.

In light of Monica's proposal for generic content-types, it seems that
the specific message format proposed in my spec should have its own
specialized MIME type so that it doesn't overlap with the use of the
general type application/json, which might be used for JSON documents
that do not conform to the specific structure I called out in the
specification.

Since this specification defines two differing JSON object structures it
should probably define a MIME type for each.

I imagine then that there would be several MIME types which are
considered "special" to a hubbub hub, where piecemeal delivery is
possible, and then any others would be treated as opaque blobs as
described in Monica's proposal.

> * What's the relation between the Discovery Document and the
> Notification Message? It seems like it's two different names for what
> amounts to the same format, that's just used differently between the
> publisher->hub and the hub->subscriber.

They started off as the same thing but with the former having a "hub"
property. However, when I came to accommodate combining notifications
for multiple topics into a single notification I added an additional
level of indirection (the "notification packet") to have somewhere to
put the topic URL.

So a discovery document looks like this:

{
"hubs": [ "https://example.com/hub" ],
"items": [
{
"arbitraryPayload": 1
}
]
}

while the notification message looks like this:

{
"items": [
{
"topic": "http://example.com/topic",
"payload": {
"arbitraryPayload": 1
},
}
]
}

This minor difference is slightly annoying, but it seemed more desirable
to me that requiring the discovery document to redundantly re-state the
topic URL for each payload.

> * In section 5, it says "Implementations MUST ignore properties that
> they do not understand." Do you mean ignore and drop these properties,
> or ignore and preserve? I feel like it could be read either way.

I suppose I mean "drop"; I'm not sure what it would mean to "preserve"
in this context, since the hub is not ever required to reproduce the
root object of either document, so it is not apparent externally whether
the hub dropped or preserved the properties.

> * Section 6: "This section defines a message format that can be used
> by the publisher..."<- should that be subscriber, or am I misreading
> this entirely?

Yes, I believe I meant to say "subscriber" here, or alternatively I may
have originally completed the sentence as "convey the locations". Either
way it doesn't make sense as currently written. Thanks.

> * Section 7 doesn't feel right to me. What you want is an equivalence
> comparison for JSON objects; what you're describing is a specific
> optimized algorithm for one. This is just an implementation detail for
> the hub; it doesn't need to be part of the spec.
>

Is there a standard comparison I could reference? I want publishers to
be able to rely on a particular definition of "the same" vs. "different"
so that they don't get differing behavior for the same documents between
hubs.

Accidental differences I was anticipating were:

* Failure to handle non-ASCII characters correctly due to differing
encodings between the stored representation and the new representation.

* Some languages failing to distinguish between 1 and true, 0 and
false, "" and false, etc.

* Failures when the publisher changes whitespace usage and the hub
failed to normalize whitespace.

> * What are the pros and cons of this proposal, versus wrapping JSON
> objects in an Atom feed?
>

I would consider it a failure if an implementation of this protocol had
to have both an Atom parser and a JSON parser around in order to
operate. The goal is a natural representation of JSON-native data;
encapsulating JSON into XML is not "natural".

Of course, the con is that hubs must have support for this additional
format, but since PubSubHubbub assumes a coupling between publisher and
hub this is not a big deal since the publisher will use a hub that
supports the serialization formats it uses.

Thanks for the feedback! Hopefully I'll be able to produce another draft
later this week incorporating all of this.

Martin Atkins

unread,

Jun 7, 2010, 12:08:27 PM6/7/10

to pubsub...@googlegroups.com

I think we need an unambiguous way to determine whether a particular hub
for a particular resource is for whole-document update notifications or
whether it's capable of format-specific delta notifications. That way
subscribers will know what to expect before the subscribe and find that
they aren't getting the kind of notification they wanted.

Here's a strawman:

If the hub is published in a Link: header, then the hub handles
whole-document updates agnostic to the type of the body.

Otherwise, the hub is declared inside the entity body in a
format-specific way. In this case, the resource format and the
notification format are defined by whatever specification defined how to
find the hub URL.

This is consistent with the capabilities we'd expect consumers of this
information to have anyway; you can't parse the atom:link in Atom/RSS or
the "hubs" property in my JSON proposal without having support for those
specific payload formats, but that's okay because you wouldn't have been
able to parse the update notifications anyway.

Brett Slatkin

unread,

Jun 7, 2010, 12:32:55 PM6/7/10

to pubsub...@googlegroups.com

On Mon, Jun 7, 2010 at 9:08 AM, Martin Atkins <ma...@degeneration.co.uk> wrote:
>
> I think we need an unambiguous way to determine whether a particular hub for
> a particular resource is for whole-document update notifications or whether
> it's capable of format-specific delta notifications. That way subscribers
> will know what to expect before the subscribe and find that they aren't
> getting the kind of notification they wanted.
>
> Here's a strawman:
>
> If the hub is published in a Link: header, then the hub handles
> whole-document updates agnostic to the type of the body.
>
> Otherwise, the hub is declared inside the entity body in a format-specific
> way. In this case, the resource format and the notification format are
> defined by whatever specification defined how to find the hub URL.
>
> This is consistent with the capabilities we'd expect consumers of this
> information to have anyway; you can't parse the atom:link in Atom/RSS or the
> "hubs" property in my JSON proposal without having support for those
> specific payload formats, but that's okay because you wouldn't have been
> able to parse the update notifications anyway.

Generally I agree with this approach. The needs of a self-describing
document/payload are different than those of a document/payload that
requires the headers for correct interpretation. The same applies to
HTML and hAtom, where the hub link would be in the html <head> and the
headers are mostly irrelevant. This also has an effect on security,
where the generic HTTP version must preserve the fidelity of the
payload's headers, whereas the self-describing document can drop them.

John Panzer

unread,

Jun 7, 2010, 3:06:34 PM6/7/10

to pubsub...@googlegroups.com

The other nice thing about self-describing data is that the hub metadata can survive things like re-syndication (which seems to mostly make sense for self-describing, stream-like data so far.)

--
John Panzer / Google
jpa...@google.com / abstractioneer.org / @jpanzer

Martin Atkins

unread,

Jun 7, 2010, 4:31:49 PM6/7/10

to pubsub...@googlegroups.com

On 06/07/2010 09:32 AM, Brett Slatkin wrote:
>
> Generally I agree with this approach. The needs of a self-describing
> document/payload are different than those of a document/payload that
> requires the headers for correct interpretation. The same applies to
> HTML and hAtom, where the hub link would be in the html<head> and the
> headers are mostly irrelevant. This also has an effect on security,
> where the generic HTTP version must preserve the fidelity of the
> payload's headers, whereas the self-describing document can drop them.

To be fully general, the notification payload could be an HTTP response
encapsulated in an HTTP request. That way the subscriber will get all of
the HTTP headers exactly as returned by the topic URL, and most
importantly the subscriber will get the HTTP status code which will
allow the subscriber to detect, for example, a 410 Gone response
indicating that the resource has been permanently removed.

The challenge with this approach, of course, is that some HTTP
frameworks may not provide a convenient way to parse an arbitrary
payload as an HTTP request, and it complicates the common case of just
getting notifications of changes to the content.

Monica Keller

unread,

Jun 7, 2010, 9:44:09 PM6/7/10

to pubsub...@googlegroups.com

I would like it to be very easy to consume arbitrary formats.
The question is how much should the hub should normalize the response from the publisher to help the subscribers ? Is it a dumb rebroadcaster which maintains fidelity of the publisher's response or is it a smarter agent which validates before re syndicating ?

I was leaning towards the latter

Producing a more RESTful PUSH specifying the method: POST, PUT, DELETE, PATCH

For example if the publisher returns a 404 the hub is not necessarily going to send that to the subscribers.

If the publisher returns the same resource that also would not be sent to the subscribers.

If the publisher returns a 410 then we can just use HTTP DELETE or the X-Method-Override to pass it in.

Overall I prefer the model where the hubs do most of the work. Potentially offering additional services such as translation of formats or malware detection so I would rather not make it a requirement to re transmit the original HTTP response in its entirety and focus on the specific headers needed.

We should probably prototype the alternatives and let several developers try and handle the pushes

Monica Keller

unread,

Jun 7, 2010, 9:54:18 PM6/7/10

to Pubsubhubbub

Just to clarify what I am advocating here is that we not require to
syndicate the entire publisher's HTTP response but rather select the
minimum fields and metadata needed to process the PUSH by the
subscriber. Additional fields/data can be transmitted but would be
outside the scope of the spec and thus allow for different hubs to
provide additional value.
So some hubs could offer the raw responses but not all would have to.

On Jun 7, 6:44 pm, Monica Keller <monica.kel...@gmail.com> wrote:
> I would like it to be very easy to consume arbitrary formats.
> The question is how much should the hub should normalize the response from
> the publisher to help the subscribers ? Is it a dumb rebroadcaster which
> maintains fidelity of the publisher's response or is it a smarter agent
> which validates before re syndicating ?
>
> I was leaning towards the latter
>
> Producing a more RESTful PUSH specifying the method: POST, PUT, DELETE,
> PATCH
>
> For example if the publisher returns a 404 the hub is not necessarily going
> to send that to the subscribers.
>
> If the publisher returns the same resource that also would not be sent to
> the subscribers.
>
> If the publisher returns a 410 then we can just use HTTP DELETE or the
> X-Method-Override to pass it in.
>
> Overall I prefer the model where the hubs do most of the work. Potentially
> offering additional services such as translation of formats or malware
> detection so I would rather not make it a requirement to re transmit the
> original HTTP response in its entirety and focus on the specific headers
> needed.
>
> We should probably prototype the alternatives and let several developers try
> and handle the pushes
>

Martin Atkins

unread,

Jun 8, 2010, 12:28:51 PM6/8/10

to pubsub...@googlegroups.com

On 06/07/2010 06:44 PM, Monica Keller wrote:
> I would like it to be very easy to consume arbitrary formats.
> The question is how much should the hub should normalize the response
> from the publisher to help the subscribers ? Is it a dumb rebroadcaster
> which maintains fidelity of the publisher's response or is it a smarter
> agent which validates before re syndicating ?
>

Given that the hubbub architecture has a coupling between the hub and
publisher, it's left to the publisher to choose a hub that suits the
publisher's needs.

I think the baseline generic implementation such as on app engine should
aspire to be as generic as possible so that it doesn't need to be
altered for each new application requirement, and this is what the
specification should describe. You can't really be "smart" and generic
at the same time.

However, a publisher is free to choose a hub that provides additional
value-add services such as normalization, if the normalization that the
hub does makes sense for the payload the publisher offers.

Monica Keller

unread,

Jun 8, 2010, 1:29:09 PM6/8/10

to pubsub...@googlegroups.com

I believe the proposal at http://code.google.com/p/pubsubhubbub/wiki/ArbitraryContentTypes is fairly generic but does not prevent hubs to produce something more advanced. If we put into the spec that the hub must retransmit the response from publisher verbatim we are making it harder for them to provide additional services and its less RESTful

Martin Atkins

unread,

Jun 8, 2010, 2:48:52 PM6/8/10

to pubsub...@googlegroups.com

On 06/08/2010 10:29 AM, Monica Keller wrote:
> I believe the proposal at
> http://code.google.com/p/pubsubhubbub/wiki/ArbitraryContentTypes is
> fairly generic but does not prevent hubs to produce something more
> advanced. If we put into the spec that the hub must retransmit the
> response from publisher verbatim we are making it harder for them to
> provide additional services and its less RESTful
>

The PubSubHubbub protocol has two "sides": publisher to hub, and hub to
subscriber.

There's no reason why you can't just implement one "side" of this, and
do something custom on the other side, if you have unique needs that the
baseline protocol doesn't provide for. Several implementations already
use subsets of hubbub in some situations, and that's fine.

I'm not sure what you mean by it being "less RESTful" to have the hub
treat the payload as opaque.

Monica Keller

unread,

Jun 15, 2010, 3:19:43 PM6/15/10

to Pubsubhubbub

Hi Guys
So I am updating the proposed v0.4 spec with the latest discussions
and was wondering if you had any suggestions on what is the proper
term for the self-describing list resources like Atom, RSS or JSON
Activity Sreams/Feeds.

An XXX document would contain a list of items with unique identifiers
and timestamps.

Ideas ?

On Jun 7, 9:32 am, Brett Slatkin <bslat...@gmail.com> wrote:

Kevin Marks

unread,

Jun 15, 2010, 4:28:35 PM6/15/10

to pubsub...@googlegroups.com

Is 'Feed' a usable term? Thats widely used or Atom + RSS, but I've seen it used for JSON too, eg http://jsonduit.com/

Martin Atkins

unread,

Jun 15, 2010, 4:39:50 PM6/15/10

to pubsub...@googlegroups.com

On 06/15/2010 01:28 PM, Kevin Marks wrote:
> Is 'Feed' a usable term? Thats widely used or Atom + RSS, but I've seen
> it used for JSON too, eg http://jsonduit.com/
>

In my JSON PubSubHubbub draft I used the term "stream resource" to
describe the more general idea of a resource that consists of a bunch of
items which each have distinct lifetimes, in an attempt to express a
superset containing the idea of a feed of the kind you'd expect to load
into a "feed reader" for human consumption as well as more
machine-oriented data.

However, I'm by no means wedded to the idea.

Perhaps it would be better to make a distinction on type of subscription
rather than type of resource.

All resources support a "whole-resource subscription" which re-delivers
the entire resource each time it changes.

Some resources support "stream-based subscription" which delivers new
items and changes to existing items in a particular media type.

The Link header defines a resource's hun for whole-resource
subscription, but a hub for a stream-based subscription is determined
via a mechanism specific to the stream media type.

Monica Keller

unread,

Jun 15, 2010, 9:49:52 PM6/15/10

to pubsub...@googlegroups.com

Thanks !
Feed and Stream are both good terms. Slight preference for Feed based on the terminology already in the 0.3 spec.

Do we think that people will ever want "whole" resource notification for a feed ? I can't think of a use case so I am leaving that part out for now and just focusing on describing topics as feeds and non feeds

Also I would rather not require that feeds have the hub declaration inside the content always given that for example existing parsers that fetch lists/array would now need a head item in the body to declare the hub.

Martin Atkins

unread,

Jun 15, 2010, 11:15:57 PM6/15/10

to pubsub...@googlegroups.com

On 06/15/2010 06:49 PM, Monica Keller wrote:
>
> Do we think that people will ever want "whole" resource notification for
> a feed ? I can't think of a use case so I am leaving that part out for
> now and just focusing on describing topics as feeds and non feeds
>

I don't see any reason why someone would use a whole-resource
subscription for an Atom feed but the ability to express it comes from
the fact that a Link header indicates a hub that accepts whole-resource
subscriptions, and a Link header can be published on any resource.
Really I was just trying to be clear about the meaning of this rather
than suggesting that you would actually do it in practice.

The other alternative would be to say that you are simply not allowed to
publish a Link header if your resource is a stream/feed resource, but I
don't think that restriction would really buy us much.

Kevin Marks

unread,

Jun 15, 2010, 11:47:08 PM6/15/10

to pubsub...@googlegroups.com

Existing practice for exporting blogs is often a whole feed, but I agree it isn't a common case; on the other hand with something like a contacts list completeness is important.
There is an assumption with PUSH at the moment that you can always fetch the original feed. With Monica's addition of the REST verbs, this gets a little more tricky.

On Jun 15, 2010 8:16 PM, "Martin Atkins" <ma...@degeneration.co.uk> wrote:

On 06/15/2010 06:49 PM, Monica Keller wrote:
>
>

> Do we think that people will ever want "whole" re...

John Panzer

unread,

Jun 16, 2010, 2:28:17 AM6/16/10

to pubsub...@googlegroups.com

Consider non time oriented feeds (like top 10 lists). They exist; a
change could be just a reordering...

--

Reply all

Reply to author

Forward