7.4 aggregated Content distribution

Tim Bray

unread,

Oct 18, 2009, 12:50:19 AM10/18/09

to Pubsubhubbub

Seems to me that the client model for processing a single vs.
aggregated distribution might be quite a bit different. And also, the
original upstream feed might have used entry/source already (this
makes me nervous about the whole notion of PuSH co-opting <source> for
its own purposes).

I was wondering if you might want to put an extension element here as
a child of feed, before the entries start, in a pubsubhubbub
namespace, saying "the following are aggregated by the hub". You can
do this safely because Atom has MustIgnore on markup it doesn't
recognize (hint hint).

[Um, when I read this section, there's a little voice in the back of
my head shouting "YAGNI!"]

-T

R P

unread,

Oct 18, 2009, 11:41:10 AM10/18/09

to pubsub...@googlegroups.com

I totally agree with you on the "YAGNI" bit. For low- to medium-volume subscribers (receiving updates at most every few minutes from any given hub), aggregated distribution doesn't really help at all, but still complicates the implementation. Since it's only really useful for big subscribers, I think aggregated distribution should be made optional. It could easily be controlled by an optional parameter at subscribe-time, without breaking compatibility.

--Ravi

Brett Slatkin

unread,

Oct 21, 2009, 5:13:08 PM10/21/09

to pubsub...@googlegroups.com

On Sun, Oct 18, 2009 at 12:50 AM, Tim Bray <tim...@gmail.com> wrote:
>
> Seems to me that the client model for processing a single vs.
> aggregated distribution might be quite a bit different. And also, the
> original upstream feed might have used entry/source already (this
> makes me nervous about the whole notion of PuSH co-opting <source> for
> its own purposes).
>
> I was wondering if you might want to put an extension element here as
> a child of feed, before the entries start, in a pubsubhubbub
> namespace, saying "the following are aggregated by the hub". You can
> do this safely because Atom has MustIgnore on markup it doesn't
> recognize (hint hint).

This is the first time I've heard someone point this out. I believe
the atom:source element was specifically included in that spec for the
purpose that PubSubHubbub is using it. Bob Wyman seemed to indicate
the same thing too in some other email threads on this list. Could you
clarify how this is "co-opting" the source element?

> [Um, when I read this section, there's a little voice in the back of
> my head shouting "YAGNI!"]

I disagree with "YAGNI" here. Take world-wide RSS traffic. Multiply by
1,000,000. We will need aggregated delivery to fully utilize links.

-Brett

Tim Bray

unread,

Oct 21, 2009, 5:34:33 PM10/21/09

to pubsub...@googlegroups.com

On Wed, Oct 21, 2009 at 2:13 PM, Brett Slatkin <bsla...@gmail.com> wrote:

>> Seems to me that the client model for processing a single vs.
>> aggregated distribution might be quite a bit different. And also, the
>> original upstream feed might have used entry/source already (this
>> makes me nervous about the whole notion of PuSH co-opting <source> for
>> its own purposes).

...

> This is the first time I've heard someone point this out. I believe
> the atom:source element was specifically included in that spec for the
> purpose that PubSubHubbub is using it. Bob Wyman seemed to indicate
> the same thing too in some other email threads on this list. Could you
> clarify how this is "co-opting" the source element?

Well, consider the popular feed at
http://planet.intertwingly.net/atom.xml - it's already an aggregation,
produced by the well-known "planet" software, and makes heavy use of
the <source> element. What happens when PSHB tries to combine this
and several other feeds?

It's arguably a shortcoming of atom:source that it doesn't handle
multiple levels of nesting. But I also think it's a mistake for PSHB
to assume that it's the only link in the aggregation chain.

>> [Um, when I read this section, there's a little voice in the back of
>> my head shouting "YAGNI!"]
>
> I disagree with "YAGNI" here. Take world-wide RSS traffic. Multiply by
> 1,000,000. We will need aggregated delivery to fully utilize links.

You're entitled to your opinion, but I think you should get it working
first and discover your bottlenecks by experience, rather than invent
something to fix a problem you're pretty sure you're going to have.
Following this advice is easier for me than for other people because
repeated humiliation has taught me that I'm not smart enough to
predict where the choke points are going to be in things that approach
internet scale.

Anyhow, the <source> element strikes me as a lousy solution. It's
going to bulk up your feeds pretty severely, unless I'm missing
something. Why not just have a

<pubsubhubbub:divider src="http://wherever..." />

separator element here and there among the entries? Or jam a bunch of
feeds together with multipart/related (works great, lots of
libraries)? -T

Bob Wyman

unread,

Oct 21, 2009, 6:09:20 PM10/21/09

to pubsub...@googlegroups.com, Tim Bray

On Wed, Oct 21, 2009 at 5:34 PM, Tim Bray <tim...@gmail.com> wrote:
> It's arguably a shortcoming of atom:source that it
> doesn't handle multiple levels of nesting.

This is a known limitation of Atom that was discussed by the Working Group at some length and on multiple occasions. Each time it was discussed, the WG decided to leave things as they were. (The general issue discussed went under the tag "provenance".)

The general issue is that when you copy an entry from any feed document other than that feed document whose metadata is in the entry's atom:source, there is no way to indicate from which feed document you copied the entry unless you insert some extension element. Of course, if you add an extension element, you'll might be breaking a signature... So, you probably don't want to do that.

One line of reasoning says that this is not a big problem. As is often noted, Atom is supposed to be about entries, not feeds. Thus, the *important* feed to associate with an entry is its source feed -- not the feed where you happened to stumble across the entry. Atom, as defined, satisfies these people.

Others will argue that it *is* important to know not only the source feed but *also* where you found the entry. This tends to lead people to want to record "provenance" or insert into an entry a list of feeds and/or locations (entries can live outside a feed) that record all the places that an entry has been found as it is copied around the network. Atom, as defined, does not satisfy these people.

bob wyman

Ravi Pinjala

unread,

Oct 21, 2009, 8:58:17 PM10/21/09

to pubsub...@googlegroups.com

I can't really judge either way about YAGNI, but certainly IAGNI in this
case. :) For any subscriber not getting multiple updates per minute
(because they're only subscribed to hundreds of feeds, say, and not tens
of thousands), aggregated delivery introduces some complexity to
parsing, and doesn't really give any benefits.

It really depends on how PuSH is used in practice. If the majority of
the bandwidth is used by a few clients subscribing to many feeds, then
aggregated delivery could be really important. If the majority of
bandwidth is going to a lot of clients, each subscribing to few feeds,
then it really won't be. I personally think that once personal news
aggregators support PuSH we'll see a huge shift in the latter direction,
but that's really just guessing on my part.

I still think aggregated distribution should be made optional, and
controlled by an extra parameter when subscribing.

--Ravi

James Holderness

unread,

Oct 24, 2009, 7:21:41 AM10/24/09

to Pubsubhubbub

On Oct 21, 10:13 pm, Brett Slatkin <bslat...@gmail.com> wrote:

> On Sun, Oct 18, 2009 at 12:50 AM, Tim Bray <timb...@gmail.com> wrote:
> > > Seems to me that the client model for processing a single vs.
> > aggregated distribution might be quite a bit different. And also, the
> > original upstream feed might have used entry/source already (this
> > makes me nervous about the whole notion of PuSH co-opting <source> for
> > its own purposes).
>

> This is the first time I've heard someone point this out. I believe
> the atom:source element was specifically included in that spec for the
> purpose that PubSubHubbub is using it.

Have you even read the Atom spec? From section 4.2.11:

"If an atom:entry is copied from one feed into another feed, then
the source atom:feed's metadata MAY be preserved within the
copied entry by adding an atom:source child element, IF IT IS
NOT ALREADY PRESENT IN THE ENTRY" (emphasis mine)

When you casually wipe out an existing source element, you're not
only erasing the true source of the entry - in some cases you're
potentially erasing the actual authorship of the entry too. I
can't believe you think that's acceptable!

The Google Readers "shared items" feeds do the same thing. In
that case you go so far as to add your own author element with
the name "(author unknown)". No, the author wasn't "unknown" -
you just deleted them.

You people need to stop pulling crap like this. It's evil.

Pádraic Brady

unread,

Oct 24, 2009, 10:24:23 AM10/24/09

to pubsub...@googlegroups.com

Whoa ;). Tone it down some. Unless everyone on the mailing list is completely confident they are an Atom expert with an eidetic memory, there's always room for others to act as instructors when we do something potentially wrong. I just fixed a few problems I overlooked in the Atom spec (empty href's equating to a "/" relative URI), and I consider myself somewhat familiar with the specification. There's no need to fly off the handle using terms like "evil" and "pulling crap" as if it were a deliberate attempt to subvert the specification.

Paddy

Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
OpenID Europe Foundation Irish Representative

From: James Holderness <j4j...@gmail.com>
To: Pubsubhubbub <pubsub...@googlegroups.com>
Sent: Sat, October 24, 2009 12:21:41 PM
Subject: [pubsubhubbub] Re: 7.4 aggregated Content distribution

John Panzer

unread,

Oct 24, 2009, 12:45:17 PM10/24/09

to pubsub...@googlegroups.com

Agreed to keeping the tone civil -- assume good intentions until proven otherwise; it can only help.

Mea culpa for not noticing this issue with PubSubHubbub. The Atom spec didn't envision this use case and so atom:source is almost, but not quite, what's needed -- thus the confusion is understandable.

The right thing to do IMHO is to add something like a psh:provenance element that is just like atom:source but tracks the most recent context of the entry.

--
John Panzer / Google
jpa...@google.com / abstractioneer.org / @jpanzer

Tim Bray

unread,

Oct 24, 2009, 12:52:40 PM10/24/09

to pubsub...@googlegroups.com

On Sat, Oct 24, 2009 at 9:45 AM, John Panzer <jpa...@google.com> wrote:

> The right thing to do IMHO is to add something like a psh:provenance element
> that is just like atom:source but tracks the most recent context of the
> entry.

The right thing to do IMHO is not aggregate at the PubSubHubbub level
until you've proved that (a) you have to and (b) multipart/related
won't cut it.

-T

Bob Wyman

unread,

Oct 24, 2009, 3:26:23 PM10/24/09

to pubsub...@googlegroups.com

Actually, the PSHB use case *was* frequently discussed in the Atom WG... The reason for this is that what PSHB does is provide pretty much what FeedMesh was intended to provide - a topic based subset of the content-based system that Pubsub.com provided.
An atom:source, if present in an entry, should never be modified or overwritten by an aggregator. An aggregator should only add atom:source if it isn't already present.
If you want to show provenence, you need to add an extension element. As pointed out in an earlier message, the WG was aware that provenence was useful but we couldn't get consensus on how to record it.
Please, if you define a provenence element, include in the definition a requirement to remove that element prior to canonicalisation when verifying signatures...

bob wyman

On Oct 24, 2009 12:45 PM, "John Panzer" <jpa...@google.com> wrote:

Agreed to keeping the tone civil -- assume good intentions until proven otherwise; it can only help.

Mea culpa for not noticing this issue with PubSubHubbub. The Atom spec didn't envision this use case and so atom:source is almost, but not quite, what's needed -- thus the confusion is understandable.

The right thing to do IMHO is to add something like a psh:provenance element that is just like atom:source but tracks the most recent context of the entry.

--
John Panzer / Google
jpa...@google.com / abstractioneer.org / @jpanzer

On Sat, Oct 24, 2009 at 7:24 AM, Pádraic Brady <padrai...@yahoo.com> wrote: > > Whoa ;). Ton...

John Panzer

unread,

Oct 24, 2009, 6:27:44 PM10/24/09

to pubsub...@googlegroups.com

Agreed on proving need first. I have some war stories about
multipart/related and batching that may be relevant; I'm skeptical
about ease of subscriber implementation.

--

John Panzer

unread,

Oct 24, 2009, 6:29:37 PM10/24/09

to pubsub...@googlegroups.com

Yep, the WG envisioned this, the final spec did not. :)

> jpa...@google.com / abstractioneer.org <http://www.abstractioneer.org/> / @jpanzer

>
>
>
> On Sat, Oct 24, 2009 at 7:24 AM, Pádraic Brady <padrai...@yahoo.com> wrote:
>>
>> Whoa ;). Ton...
>
>

--

Brett Slatkin

unread,

Oct 24, 2009, 6:47:53 PM10/24/09

to pubsub...@googlegroups.com

Cool. This thread is a great example of peer review. I'll file an issue in the bug tracker to distill this. Thanks everyone!

-Brett

On Oct 24, 2009 6:29 PM, "John Panzer" <jpa...@google.com> wrote:

Yep, the WG envisioned this, the final spec did not. :)

On Saturday, October 24, 2009, Bob Wyman <b...@wyman.us> wrote: > Actually, the PSHB use case *was* ...

> jpa...@google.com / abstractioneer.org <http://www.abstractioneer.org/> / @jpanzer

> > > > On Sat, Oct 24, 2009 at 7:24 AM, Pádraic Brady <padrai...@yahoo.com> wrote: >> >> Whoa ;...

Reply all

Reply to author

Forward