How to check what pubsubhubbub.appspot.com is doing?

809 views
Skip to first unread message

Tim Bray

unread,
Oct 19, 2009, 3:06:45 PM10/19/09
to pubsub...@googlegroups.com
I *think* I'm pinging appspot when I update (it goes into the code
that does so and doesn't report any errors) but I look into my
logfiles and don't see any accesses from appspot since about 7PM last
night. Is there a way to find out when appspot thinks I've pinged it?

Hey, my publishing system is 2500 lines of Perl in one file that I
wrote in a few days in 2002, what could possibly go wrong...

-Tim

Brett Slatkin

unread,
Oct 19, 2009, 4:09:16 PM10/19/09
to pubsub...@googlegroups.com
On Mon, Oct 19, 2009 at 3:06 PM, Tim Bray <tim...@gmail.com> wrote:
>
> I *think* I'm pinging appspot when I update (it goes into the code
> that does so and doesn't report any errors) but I look into my
> logfiles and don't see any accesses from appspot since about 7PM last
> night.  Is there a way to find out when appspot thinks I've pinged it?

Do you have any active subscribers? No subscribers = ping is a no-op.

There is this diagnostic form (at the bottom) for debugging a
subscriber: http://pubsubhubbub.appspot.com/subscribe

And this one for publishers (also at the bottom):
http://pubsubhubbub.appspot.com/publish

Lemme know if you can't get those to work.


> Hey, my publishing system is 2500 lines of Perl in one file that I
> wrote in a few days in 2002, what could possibly go wrong...

=)

Pádraic Brady

unread,
Oct 19, 2009, 4:18:44 PM10/19/09
to pubsub...@googlegroups.com
Hi Tim,

Not sure if its relevant in terms of the reference hub, but my Subscriber has trouble with your feed. What it's doing is accepting your Atom feed URI "http://www.tbray.org/ongoing/ongoing.atom" and then attempting to verify the location of the feed, i.e. to rule out redirects etc. To do this, it parses the feed for an atom link with a rel attribute of "self". In the case of your feed, however, the href attribute seems to be empty - so my Subscriber passes back an exception stating that the feed's final location could not be verified.

I'd expect the Hub to be making a similar check in verifying the subscription details. It's a total stretch, but since this seems to breach the Atom specification it's possibly related. See http://tools.ietf.org/html/rfc4287#section-4.2.7

Paddy
 
Pádraic Brady

http://blog.astrumfutura.com
http://www.survivethedeepend.com
OpenID Europe Foundation Irish Representative



From: Tim Bray <tim...@gmail.com>
To: pubsub...@googlegroups.com
Sent: Mon, October 19, 2009 8:06:45 PM
Subject: [pubsubhubbub] How to check what pubsubhubbub.appspot.com is doing?

Pádraic Brady

unread,
Oct 19, 2009, 4:27:28 PM10/19/09
to pubsub...@googlegroups.com
Hi Tim,

I might be off the rails saying the feed is invalid. Could anyone clarify the meaning of an empty href attribute in an Atom link? I always figured it was required to contain a URI if present. If that's not the case, me better fix me subscriber ;).

From: Tim Bray <tim...@gmail.com>
To: pubsub...@googlegroups.com
Sent: Mon, October 19, 2009 8:06:45 PM
Subject: [pubsubhubbub] How to check what pubsubhubbub.appspot.com is doing?


Brett Slatkin

unread,
Oct 19, 2009, 4:30:35 PM10/19/09
to pubsub...@googlegroups.com
I'm doing this same "anti-aliasing" of feed URLs by using
//atom:feed/id elements. The "self" link (which is in the 0.2 spec)
doesn't work right in practice, methinks. Not sure how well we'll
define this for 0.3 though.

-Brett

Pádraic Brady

unread,
Oct 19, 2009, 4:33:16 PM10/19/09
to pubsub...@googlegroups.com
It would have been the ideal workflow - but it looks like Atom feeds validate online without it, or with an empty href value attached. I'll still use it where it's available though, otherwise I've shifted to simply checking if the result is a valid feed (of whatever type). Might be some otherway of tracking any movements from my http client.


From: Brett Slatkin <bsla...@gmail.com>
To: pubsub...@googlegroups.com
Sent: Mon, October 19, 2009 9:30:35 PM
Subject: [pubsubhubbub] Re: How to check what pubsubhubbub.appspot.com is doing?

Pádraic Brady

unread,
Oct 19, 2009, 4:38:38 PM10/19/09
to pubsub...@googlegroups.com
My own error checking updated - I'm getting this for Tim's feed with the reference hub:

REQUEST:

POST / HTTP/1.1
Host: pubsubhubbub.appspot.com
Connection: close
Accept-encoding: gzip, deflate
User-Agent: Zend_Feed_Pubsubhubbub_Subscriber/1.10.0dev
Content-Length: 272

hub.callback=http%3A%2F%2Fhub.mydomain.com%2Fcallback%2Ff6d1be9a2ef5217287b5e9fec6be16f2&hub.mode=subscribe&hub.topic=http%3A%2F%2Fwww.tbray.org%2Fongoing%2Fongoing.atom&hub.verify=sync&hub.verify=async&hub.verify_token=14080570384adccdd95d8aa4.303645251255984601

RESPONSE (PHP Error Array)

array
  0 =>
    array
      'response' =>
        object(Zend_Http_Response)[91]
          protected 'version' => string '1.1' (length=3)
          protected 'code' => int 409
          protected 'message' => string 'Conflict' (length=8)
          protected 'headers' =>
            array
              ...
          protected 'body' => string '24

Error trying to confirm subscription

0



' (length=47)
      'hubUrl' => string 'http://pubsubhubbub.appspot.com/' (length=32)

So no subscribing it seems. My other subscriptions are going through fine.

Sent: Mon, October 19, 2009 9:30:35 PM
Subject: [pubsubhubbub] Re: How to check what pubsubhubbub.appspot.com is doing?

Tim Bray

unread,
Oct 19, 2009, 5:15:49 PM10/19/09
to pubsub...@googlegroups.com
On Mon, Oct 19, 2009 at 1:18 PM, Pádraic Brady <padrai...@yahoo.com> wrote:

> Not sure if its relevant in terms of the reference hub, but my Subscriber
> has trouble with your feed. What it's doing is accepting your Atom feed URI
> "http://www.tbray.org/ongoing/ongoing.atom" and then attempting to verify
> the location of the feed, i.e. to rule out redirects etc. To do this, it
> parses the feed for an atom link with a rel attribute of "self". In the case
> of your feed, however, the href attribute seems to be empty - so my
> Subscriber passes back an exception stating that the feed's final location
> could not be verified.
>
> I'd expect the Hub to be making a similar check in verifying the
> subscription details. It's a total stretch, but since this seems to breach
> the Atom specification it's possibly related. See
> http://tools.ietf.org/html/rfc4287#section-4.2.7

I'm one of the designers of Atom, plus I wrote a chapter of 3986, and
I have to confess that my feed contains some stuff that exercises the
corner cases and gives clients trouble; you're not the first to have
problems. I do claim the feed is valid, and the online validators
agree. 4.2.7.1 makes it clear that the value of link@href is an IRI
reference. The actual value, from my feed, is

<link rel='self' href='' />

i.e. an empty string. A URI reference that doesn't begin with a
scheme is by definition relative (see RFC3986) and should be
absolutized in the standard way. Wherever you fetched this from,
tbray.org or (I presume) appspot.com, note that my feed's root element
has this attribute:

xml:base='http://www.tbray.org/ongoing/ongoing.atom'

Thus, per section 5.1.1 of 3986, you should absolutize the empty
string against the base URI and effectively get
href="http://www.tbray.org/ongoing/ongoing.atom"

In general relative URIs are A Good Thing for publishing systems
because it means you can move whole file trees around without breaking
things; i.e. the staging system where I write my blog isn't on
tbray.org, but everything Just Works because the base is set correctly
and all the URIs are relative.

Pardon my pedantry, but I do think this should work.

-Tim

Pádraic Brady

unread,
Oct 19, 2009, 5:29:52 PM10/19/09
to pubsub...@googlegroups.com
No problem :). I'm going to look into the parsing library and see if I can figure out what it's missing.
Sent: Mon, October 19, 2009 10:15:49 PM
Subject: [pubsubhubbub] Re: How to check what pubsubhubbub.appspot.com is doing?

Tim Bray

unread,
Oct 19, 2009, 5:49:51 PM10/19/09
to pubsub...@googlegroups.com
On Mon, Oct 19, 2009 at 1:30 PM, Brett Slatkin <bsla...@gmail.com> wrote:
>
> I'm doing this same "anti-aliasing" of feed URLs by using
> //atom:feed/id elements. The "self" link (which is in the 0.2 spec)
> doesn't work right in practice, methinks. Not sure how well we'll
> define this for 0.3 though.

Really? I thought the requirement to use the 'self' link was smart,
precisely for the reason given in the spec. What's the problem? -Tim

Alexis Richardson

unread,
Oct 19, 2009, 6:00:34 PM10/19/09
to pubsub...@googlegroups.com
I'd like to throw in a +1 for self too.

Alexis Richardson

unread,
Oct 19, 2009, 6:01:01 PM10/19/09
to pubsub...@googlegroups.com
That is: for "self" not self. <sigh>

Pádraic Brady

unread,
Oct 19, 2009, 6:03:10 PM10/19/09
to pubsub...@googlegroups.com
Probably because parsers need more bug reports ;). I agree - since this was a parser error (fixing now) it's likely not as unreliable as I first thought. Let parsers worry about their own compliance.
Sent: Mon, October 19, 2009 10:49:51 PM
Subject: [pubsubhubbub] Re: How to check what pubsubhubbub.appspot.com is doing?

Jay Rossiter

unread,
Oct 19, 2009, 6:34:50 PM10/19/09
to pubsub...@googlegroups.com

    Another +1 from me unless a reasonable, defensible alternative is presented.
--

Jay Rossiter | Software Engineer/System Administrator
Pioneering RSS Advertising Solutions

jros...@pheedo.com | Phone: 503.896.6187 | Fax: 503.235.2216
Website: www.pheedo.com | RSS: www.pheedo.info/index.xml
pheedo.gif

Brett Slatkin

unread,
Oct 21, 2009, 5:33:14 PM10/21/09
to pubsub...@googlegroups.com
In practice, "self" could be one of 10 URLs. Most CMSes out there will
return self equal to whatever URL you happen to come across the feed.
A great example is Google Reader Shared Items. Here's my feed, and
it's variants. Each one is essentially the same feed, but with small
differences in the URL:

http://www.google.com/reader/public/atom/user%2F10577182142674084604%2Fstate%2Fcom.google%2Fbroadcast
http://www.google.com/reader/public/atom/user%2F10577182142674084604%2Fstate%2Fcom.google%2Fbroadcast?ann=true
http://www.google.co.jp/reader/public/atom/user%2F10577182142674084604%2Fstate%2Fcom.google%2Fbroadcast?ann=true
http://www.google.co.uk/reader/public/atom/user%2F10577182142674084604%2Fstate%2Fcom.google%2Fbroadcast?ann=false
...

It gets worse, considering Google has 100+ country-specific domains.
So my point is, in the wild, in practice, nobody uses "self"
correctly. We're setting ourselves up for failure if we expect
publishers and subscribers alike to properly serve and follow "self"
links. Resolving the atom:ids (which are the same for all of the feeds
above) seems to work a lot better in practice.

Does that make sense?

Jay Rossiter

unread,
Oct 21, 2009, 6:39:31 PM10/21/09
to pubsub...@googlegroups.com

    I think what you're documenting is an example of bad adherence to the Atom spec more than anything else.  The self link should be the same in every document as well, because it's the "preferred URI for retrieving Atom Feed Documents representing this Atom feed".  It's the responsibility of the publisher (or publishing service) to ensure that the self link is identical for all potential URLs that feed can be retrieved at.

    atom:id IRIs are per-spec not dereferencable - they're permanent, even if the feed changes domains.  How does the hub handle a subscription request for an atom:id that it has never seen before?  It doesn't know where to go to pull that feed's contents if the publisher has not set up pinging.

    In my case, using atom:id adds migration problems for RSS support.  When feeds come into Pheedo, they're (mostly) not hub enabled.  We are the ones enabling hub support on behalf of the user at their request.  Since atom:id is by definition permanent, that means it really needs to be controlled by the publisher, not by an intermediary service.  If the publisher were to leave Pheedo, or inject another intermediary service ahead of us in the processing chain, the atom:id would likely change (by disappearing entirely if they left, or by the other service having to add its own atom:id for the same reasons that we did.)

    Long and short - self links make sense, both to users and to the hubs.  They're meaningful.  atom:id, other than being unique to the feed, carries no information about the feed.
pheedo.gif
Reply all
Reply to author
Forward
0 new messages