Will,
First thanks for the sudden new energy shown by you and your team
members in the past few days. Everything is responded in detail and
valid suggestions are being taken at best. Thanks again.
On the issue discussed above, as you have mentioned above and also in
digg comments (
http://digg.com/news/technology/introducing_digg_s_streaming_api_digg_about/20101029215107:e66adf324dfc4870be301da7b210d0d9#20101029223051:b830e45439794233bb1696a410e56c15
) removing restrictions on history is really vital. However, just
removing the history restriction and thereby enabling lookups for time
periods X to Y alone, would not solve some critical issues, in my
opinion.
I believe this is best time to again remind you and the rest of the
team about a major & unresolved issue with both the website and API.
1. Other than the stream, the main way to discover every single story
submitted is getUpcomingStories. However, Upcoming Stories is not
actually working so, which I thought was a bug, but lately seems as a
planned behavior.
http://digg.com/news/technology/breaking_breaking_news/20101111191839:c713da83848e4ac98e9d3fefbc96a9be
This has not changed, here is what I see as Upcoming (recent) -
http://i.imgur.com/D7eor.png As you can see, the list is 40 minutes
old (probably your cron or something like that failed and is
understandable), but the list contains only a VERY small sample of the
actual submissions being made. Is there a plan to fix this at all? If
no, atleast for the API part -- can you add a method which exposes
every single story submitted, going back. I cannot resist but point
out the irony in the above comment thread, digg made a move to a more
efficient backend to offer better features, but now citing performance
issues to offer features which were possible on the old weaker backend
-- seems very irrational.
2. However, there is also one more easier workaround/fix for the above
problem, which will also make a BIG improvement to the stream.
Irrespective of fixing item 1, doing this will make entirely stream
based applications more error proof. Twitter stream uses a "count"
parameter much like the "return_after" on digg stream. However, count
allows both -ve and +ve values. When -ve values are used, they work as
a "catch-up" mechanism. If you can change "return_after" to accept
both positive and negative values -- with -ve values, if the stream
goes in the reverse for the x items, where x is the magnitude of the -
ve number ... most of the problems encountered in my point 1 can be
circumvented.
3. I also see that, theoretically search.search can be used for this
purpose (finding all stories in a time range), however it does not
really work so. For example, this query
http://services.digg.com/2.0/search.search?max_date=1291139686&min_date=1291052700&sort=submit_date-asc&count=100&offset=4734
gets results totally outside the time range requested for. I did not
really analyze in detail, but even the count of 450K+ submissions it
suggests for a 24 hr period seems totally out of range.
Once again, you guys are doing an awesome job. Just small changes and
tweaks at a few places, will make huge improvements.
Thanks!