PubSubHubbub and Twitter RSS

Jesse Stay

unread,

Aug 9, 2009, 12:06:10 AM8/9/09

to twitter-deve...@googlegroups.com

I know Twitter has bigger priorities, so if you can put this on your "to think about" list for after the DDoS problems are taken care of, I'd appreciate it. Perhaps this question is for John since it has to do with real-time. Anyway, is there any plan to support the PubSubHubbub protocol with Twitter's RSS feeds for users? I think that could be a great alternative to Twitter real-time that's standards compliant and open. It would also make things really easy for me for a project I'm working on. Here's the standard in case anyone needs a refresher:

http://code.google.com/p/pubsubhubbub/

You guys would rule if you supported this. It would probably take a bit less strain on what you're doing now as well for real-time feeds. It could also reduce repeated polling on RSS.

Jesse

John Kalucki

unread,

Aug 9, 2009, 2:23:07 AM8/9/09

to Twitter Development Talk

Jesse,

I've looked into PubSubHubHub, as have others at Twitter. It's not on
our roadmap, because the Streaming API meets most of our developers'
real-time and push needs. There are holes, to be sure, and we have
features on the roadmap to plug those holes as priority and schedules
permit. But, we've proven the platform at modest scale, and believe
that it will scale up significantly with little additional effort.

There also may be some interesting scaling issues with a Request-
Response push mechanism that are avoided with a streaming approach.
We'd need quite a farm of threads to have sufficient outbound
throughput against the RTT latency of an HTTP post. I would have to
assume that nearly all high-volume updaters and most mid-volume
updaters would be pushed to a non-trivial number of hubs. Tractable,
but it would require some effort, especially to deal with unreliable
and slow hubs.

If you are looking at RSS feeds, I'd guess that you are looking at
grabbing user timelines. The Streaming API already supports this via
the /follow resource. If this doesn't meet your needs, email me with
your requirements and we'll see how we can support your use case.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

Jesse Stay

unread,

Aug 9, 2009, 12:35:04 PM8/9/09

to twitter-deve...@googlegroups.com

On Sun, Aug 9, 2009 at 2:23 AM, John Kalucki <jkal...@gmail.com> wrote:

There also may be some interesting scaling issues with a Request-
Response push mechanism that are avoided with a streaming approach.
We'd need quite a farm of threads to have sufficient outbound
throughput against the RTT latency of an HTTP post. I would have to
assume that nearly all high-volume updaters and most mid-volume
updaters would be pushed to a non-trivial number of hubs. Tractable,
but it would require some effort, especially to deal with unreliable
and slow hubs.

No, not necessarily - through HTTP pipelining and persistent connections, it should be relatively little cost on your end, possibly even less than you are using currently, utilizing an open standard everyone is familiar with to do so. See this:

http://code.google.com/p/pubsubhubbub/wiki/PublisherEfficiency

My reason for suggesting this, while I understand you have a way to do so, is that this uses existing protocols to build your API around. Therefore it's less development cost on your end, less development cost for the developers wanting to implement, and Twitter becomes more of a utility and less of a walled garden on the streaming feed. In the end, with community (and Twitter's) involvement, I think you'll see much less cost on your end by utilizing an open standard like this, vs. integrating your own solution. I'd really like to see Twitter join the rest of the community building on these open standards. I think it would be a huge value to the open standards community, regardless.

Also, add to that the potential for distribution in an event like this DDoS. Twitter could very simply utilize Feedburner and other Hubs to distribute their content, real-time, with even less cost to their production environment, and more developers embracing the platform. Twitter could even do this selectively if their intent is to monetize the full firehose, only enabling user timelines pubsub-accessible and available to 3rd-party hubs like Feedburner. I think it would be a huge win for Twitter.

Jesse

Nick Arnett

unread,

Aug 9, 2009, 12:53:49 PM8/9/09

to twitter-deve...@googlegroups.com

Couldn't app developers do this on their own, by allowing the user to configure "Also publish to pubsubhubhub server" in the app? There's a potential revenue stream there for developers - charge a small fee for this use of the server. That would make the system even more robust, since their would still be a publishing path even if Twitter were completely down.

Seems to me that there are good reasons for both to exist... and I don't see why Twitter needs to take the lead on this. Current Twitter apps are sort of like email clients that can only talk to one brand of mail server.

To put this another way, I think app developers need to start thinking of it the way they really are using it - as infrastructure. Complaining about the current problem is a bit like a mechanic complaining that an auto parts store doesn't have a particular part when there are ten other stores that have it in stock.

Nick

Jesse Stay

unread,

Aug 9, 2009, 1:00:02 PM8/9/09

to twitter-deve...@googlegroups.com

Just so I'm clear, my suggestion on PubSubHubbub isn't meant to be a complaint. I'm hoping it at least starts a worthy and constructive discussion on standards-based real time distribution. I'm hoping I'm being constructive here - I'd like to see Twitter survive the next DDoS, and I'd also like to see it much easier for developers to embrace Twitter as a "utility" or "the pulse of the internet" (as TechCrunch puts it). For that to happen, basing on open standards (or opening your own for other groups to embrace in their own environments) is the only way that will happen. There are already great ways of doing this, so why re-invent the wheel when you could be contributing to a great cause that already exists?

Jesse

John Kalucki

unread,

Aug 11, 2009, 7:40:25 PM8/11/09

to Twitter Development Talk

Persistent connections for notifications certainly allay my fears
around latency and throughput for well-behaved hubs.

I don't see any real cost savings, other than some minor bandwidth
savings on the fan-out. We'd incur significant development cost to
support this alternative approach, as we've already invested in a
solution that suits our needs. We'd probably want to build or modify
and integrate a client that fanned out to multiple hubs and defended
against slow and malicious hubs, as well all the operational features
that we require. We'd need to maintain state of hubs to connect to,
authentication credentials, queue depths, latency statistics and so
forth. This is all roughly as much infrastructure as the Streaming
API.

A major use of the Streaming API is filtered streams. I suspect that
this would be difficult, if not impossible, to support in
PubSubHubBub.

The argument about standards isn't particularly compelling. The
protocol may be documented, but it isn't ubiquitous must-have
standard.

The Streaming API is probably the service that is easiest for us to
replicate for availability during a DDoS.

Technically, someone could build a service to consume from the
Streaming API and push into PubSubHubBub. This would be against the
EULA though.

The elephant in the room is control of the data, at any volume. This
isn't about 100% vs 1%, rather Twitter doesn't allow re-syndication.
With the Streaming API, we can require various implicit and explicit
licensing terms for various levels of access, disable malicious
actors, and so forth. By pushing out to a hub, we've ceded control of
these issues.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

On Aug 9, 9:35 am, Jesse Stay <jesses...@gmail.com> wrote:

John Kalucki

unread,

Aug 11, 2009, 7:49:51 PM8/11/09

to Twitter Development Talk

I want to believe! If there are compelling arguments, we'll consider
this deeply. So far I'm still unconvinced.

The Streaming API is built only on open standards: HTTP, XML, JSON.
Anyone can build a Streaming API client in a few hours of work using
well worn libraries and techniques. (Only a slight abuse of the HTTP
protocol is required.)

FWIW: In this current DDoS, publishing to hubs probably would have
fallen behind or ran out of content anyway. The redundancy required to
ride out the DDoS for either the Streaming API or publishing are
similar, if not the same. It's not an altogether compelling argument.

Moving to PubHubSubBub seems like reinventing the Streaming API wheel.
This argument cuts both ways.

The most convincing arguments would be those supporting a broad use
case that we currently cannot support via the existing Twitter APIs.

On Aug 9, 10:00 am, Jesse Stay <jesses...@gmail.com> wrote:
> Just so I'm clear, my suggestion on PubSubHubbub isn't meant to be a
> complaint. I'm hoping it at least starts a worthy and constructive
> discussion on standards-based real time distribution. I'm hoping I'm being
> constructive here - I'd like to see Twitter survive the next DDoS, and I'd
> also like to see it much easier for developers to embrace Twitter as a
> "utility" or "the pulse of the internet" (as TechCrunch puts it). For that
> to happen, basing on open standards (or opening your own for other groups to
> embrace in their own environments) is the only way that will happen. There
> are already great ways of doing this, so why re-invent the wheel when you
> could be contributing to a great cause that already exists?
>
> Jesse
>

> On Sun, Aug 9, 2009 at 12:53 PM, Nick Arnett <nick.arn...@gmail.com> wrote:

Reply all

Reply to author

Forward