Simulating Hacker News Homepage (again)

256 views
Skip to first unread message

Michael Mahemoff

unread,
Oct 22, 2011, 9:29:32 AM10/22/11
to hnse...@googlegroups.com
This was raised earlier [1], but the URL mentioned [2] is returning results days-weeks old. I've tried tweaking the "72" figure all the way down to ~1, but it returns results 2 days old.

I'm aware there's the RSS feeds and the new bigrss [3 (thanks Andre)], but they're not so useful for my Reader app [4], because they don't include points or comments count. I had previously used the unofficial iHackerNews API, but found it was timing out more often than ideal.

lightyrs

unread,
Oct 22, 2011, 5:06:06 PM10/22/11
to hnse...@googlegroups.com
Agreed, this is a major problem.  Andres has been great in that he's very responsive, however, this needs to addressed.

lightyrs

unread,
Oct 22, 2011, 5:37:45 PM10/22/11
to hnse...@googlegroups.com
I think the HNSearch homepage should always have an up-to-date one-liner of the exact query needed to reproduce the HN homepage.  It's been this weird cat-and-mouse game thus far and I don't think it serves anyones best interests.  If PG really wants the API to replace scrapers, we need this information to be easily accessible.  Andres, please consider this.  That being said, I do really appreciate all of the time and effort that you have put into the API thus far.

Thanks,

Harris

Harish Agarwal

unread,
Oct 22, 2011, 6:00:17 PM10/22/11
to hnse...@googlegroups.com
Hey all,

We've been making major infrastructure changes lately and the indexer
has been going down frequently. Sorry about that.

Moving forward we are going to focus on making the indexer more
robust, hopefully this will help prevent these sorts of issues in the
future.

Harish

lightyrs

unread,
Oct 22, 2011, 6:13:59 PM10/22/11
to hnse...@googlegroups.com
Thanks Harish.  Noticing improvements already.

Andres Morey

unread,
Oct 23, 2011, 1:40:57 AM10/23/11
to hnse...@googlegroups.com
I added methods to insert HNSearch item ids into the HN feeds here:


By parsing the feeds you can obtain item ids and then fetch them using the get_multi method:

http://api.thriftdb.com/api.hnsearch.com/items/_bulk/get_multi?ids=3145237-337e2,3145118-2e3b9,3144628-86545

I'm not sure if this is the best long term solution so keep track of this thread for any changes. Let me know if that helps.

Andres

Michael Mahemoff

unread,
Oct 23, 2011, 12:13:19 PM10/23/11
to hnse...@googlegroups.com
Many thanks Andres.

I'm just wondering if you'd consider adding the auxiliary fields, since it's fairly small as feeds go (containing only a title, not the content). author and create_ts are standard RSS fields anyway, so the only non-standard data would be num_comments and points.

I could certainly get by with a batch fetch, but I suspect those few extra fields will save a lot of round-tripping, making users happier. (In this case, I'd be using Google's Ajax Feed API, so it would also cut the load from HNSearch's servers altogether.)

Either way, thanks for adding the IDs.

Andres Morey

unread,
Oct 23, 2011, 1:22:31 PM10/23/11
to hnse...@googlegroups.com
My only hesitation in adding metadata to the rss feed is that the HNSearch server has to make an extra request to ThriftDB and, in terms of latency, it's no faster for HNSearch to make the call than any other machine running on AWS.

Also, HNSearch data will always lag behind HN (by at least 5 minutes) so this doesn't seem like a good long term solution. I have some ideas on better solutions but I'll need to think about it some more.

Anyways, I added <username>, <create_ts>, <num_comments>, <points> to the rss feeds (where available):


Andres

Michael Mahemoff

unread,
Oct 23, 2011, 1:33:52 PM10/23/11
to hnse...@googlegroups.com
Wow, that's awesome, thanks! Looks like this pretty much enough to replicate the homepage.

At least for my app, I can certainly deal with a lag and would be perfectly fine with the RSS was cached for a few minutes. Maybe you could just cache the RSS and longer-term, build in XMPP/PSHB for the (much fewer, I suspect) apps that really need real-time info.

Michael Mahemoff

unread,
Oct 24, 2011, 5:06:44 PM10/24/11
to hnse...@googlegroups.com
Got it working now against the new RSS - you can see it in use here:
https://github.com/mahemoff/hackernews/blob/2a258f90fb91543ca6ce1aa2162e88b19a9264f4/index.coffee#L48
And live at:

If anyone else is using the Google Ajax Feed API, one gotcha is it caches aggressively, which is why the above code includes a random nocache param.

Andres Morey

unread,
Oct 24, 2011, 10:06:34 PM10/24/11
to hnse...@googlegroups.com
Hmm - I thought the HN RSS feed returned the stories on the front page. Are you trying to recreate the HN homepage?

Andres

Michael Mahemoff

unread,
Oct 24, 2011, 10:41:23 PM10/24/11
to hnse...@googlegroups.com
Not sure what you mean? I'm trying to show the same stories as the homepage, but with a different presentation. Also, the stories are sorted by created_ts on my app.

Andres Morey

unread,
Oct 24, 2011, 11:17:04 PM10/24/11
to hnse...@googlegroups.com
Nevermind - I made a request to the rss url and got some cached data.

Andres
Reply all
Reply to author
Forward
0 new messages