Rate limit & Python Digg

22 views
Skip to first unread message

Ben

unread,
Nov 24, 2010, 5:05:44 AM11/24/10
to Digg API
I'm doing a study on Digg and I'd like to get story id,story title and
topic.
I use method search.search to get those information of all stories of
a explicit time period
(1 week, about 1.6 millon stories).

But I often receive error 1068.

Is there any method to get story id, title and topic by Python Digg?

Or there is anothor way to detect ratelimit using Python Digg?

Will

unread,
Nov 27, 2010, 12:41:52 PM11/27/10
to Digg API
If you're looking to get the stories for some week as opposed to a
specific week, then the /2.0/stream API is probably the easiest way to
collect the complete collection of stories for a period of time, as
the Streaming API isn't susceptible to rate limiting and ensures you
are getting all stories.

Regarding the ratelimiting, each HTTP response includes these headers:

X-RateLimit-Current:97
X-RateLimit-Max:5000
X-RateLimit-Reset:3184

Current is the number of requests in the current timeframe, Max is the
maximum requests in the current timeframe, and Reset is the number of
seconds until the timeframe ends. Using these you can monitor how
close you are to a new timeframe (meaning Current resets to 0).

Retrieving a full list of stories from X until Y is likely the most
common request for our API; we'll sit down with the team and see what
we can do to support this common usecase and hopefully find something
better.

Thanks,
Will

NewToDiggApi

unread,
Nov 30, 2010, 12:29:55 PM11/30/10
to Digg API
Will,

First thanks for the sudden new energy shown by you and your team
members in the past few days. Everything is responded in detail and
valid suggestions are being taken at best. Thanks again.

On the issue discussed above, as you have mentioned above and also in
digg comments (
http://digg.com/news/technology/introducing_digg_s_streaming_api_digg_about/20101029215107:e66adf324dfc4870be301da7b210d0d9#20101029223051:b830e45439794233bb1696a410e56c15
) removing restrictions on history is really vital. However, just
removing the history restriction and thereby enabling lookups for time
periods X to Y alone, would not solve some critical issues, in my
opinion.

I believe this is best time to again remind you and the rest of the
team about a major & unresolved issue with both the website and API.


1. Other than the stream, the main way to discover every single story
submitted is getUpcomingStories. However, Upcoming Stories is not
actually working so, which I thought was a bug, but lately seems as a
planned behavior. http://digg.com/news/technology/breaking_breaking_news/20101111191839:c713da83848e4ac98e9d3fefbc96a9be
This has not changed, here is what I see as Upcoming (recent) -
http://i.imgur.com/D7eor.png As you can see, the list is 40 minutes
old (probably your cron or something like that failed and is
understandable), but the list contains only a VERY small sample of the
actual submissions being made. Is there a plan to fix this at all? If
no, atleast for the API part -- can you add a method which exposes
every single story submitted, going back. I cannot resist but point
out the irony in the above comment thread, digg made a move to a more
efficient backend to offer better features, but now citing performance
issues to offer features which were possible on the old weaker backend
-- seems very irrational.

2. However, there is also one more easier workaround/fix for the above
problem, which will also make a BIG improvement to the stream.
Irrespective of fixing item 1, doing this will make entirely stream
based applications more error proof. Twitter stream uses a "count"
parameter much like the "return_after" on digg stream. However, count
allows both -ve and +ve values. When -ve values are used, they work as
a "catch-up" mechanism. If you can change "return_after" to accept
both positive and negative values -- with -ve values, if the stream
goes in the reverse for the x items, where x is the magnitude of the -
ve number ... most of the problems encountered in my point 1 can be
circumvented.

3. I also see that, theoretically search.search can be used for this
purpose (finding all stories in a time range), however it does not
really work so. For example, this query
http://services.digg.com/2.0/search.search?max_date=1291139686&min_date=1291052700&sort=submit_date-asc&count=100&offset=4734
gets results totally outside the time range requested for. I did not
really analyze in detail, but even the count of 450K+ submissions it
suggests for a 24 hr period seems totally out of range.

Once again, you guys are doing an awesome job. Just small changes and
tweaks at a few places, will make huge improvements.

Thanks!

Will

unread,
Dec 1, 2010, 6:18:26 PM12/1/10
to Digg API
Hi,

Replies inline.

On Nov 30, 9:29 am, NewToDiggApi <tmoha...@gmail.com> wrote:
> Will,
>
> First thanks for the sudden new energy shown by you and your team
> members in the past few days. Everything is responded in detail and
> valid suggestions are being taken at best. Thanks again.

We're glad to help!

> On the issue discussed above, as you have mentioned above and also in
> digg comments (http://digg.com/news/technology/introducing_digg_s_streaming_api_digg...
> ) removing restrictions on history is really vital. However, just
> removing the history restriction and thereby enabling lookups for time
> periods X to Y alone, would not solve some critical issues, in my
> opinion.

Exposing the full history will definitely be a powerful feature. It's
something we intend to do, but we don't have a timeframe.

> I believe this is best time to again remind you and the rest of the
> team about a major & unresolved issue with both the website and API.

Thank you. :)

> 1. Other than the stream, the main way to discover every single story
> submitted is getUpcomingStories. However, Upcoming Stories is not
> actually working so, which I thought was a bug, but lately seems as a
> planned behavior.http://digg.com/news/technology/breaking_breaking_news/20101111191839...
> This has not changed, here is what I see as Upcoming (recent) -http://i.imgur.com/D7eor.pngAs you can see, the list is 40 minutes
> old (probably your cron or something like that failed and is
> understandable), but the list contains only a VERY small sample of the
> actual submissions being made. Is there a plan to fix this at all? If
> no, atleast for the API part -- can you add a method which exposes
> every single story submitted, going back. I cannot resist but point
> out the irony in the above comment thread, digg made a move to a more
> efficient backend to offer better features, but now citing performance
> issues to offer features which were possible on the old weaker backend
> -- seems very irrational.

We have an approach sketched out for changing the behavior of upcoming
recent, but it'll take a bit before we're able to get around to it. In
the meantime, the Streaming API can support this usecase if you have
the resources to load the data into a database on your side.

> 2. However, there is also one more easier workaround/fix for the above
> problem, which will also make a BIG improvement to the stream.
> Irrespective of fixing item 1, doing this will make entirely stream
> based applications more error proof. Twitter stream uses a "count"
> parameter much like the "return_after" on digg stream. However, count
> allows both -ve and +ve values. When -ve values are used, they work as
> a "catch-up" mechanism. If you can change "return_after" to accept
> both positive and negative values -- with -ve values, if the stream
> goes in the reverse for the x items, where x is the magnitude of the -
> ve number ... most of the problems encountered in my point 1 can be
> circumvented.

This would be a fairly substantial change to the Streaming API, which
may only be a work around for the non-real-time nature of the upcoming
data. I hope we can support the requirements with the work described
above to make upcoming real time.

> 3. I also see that, theoretically search.search can be used for this
> purpose (finding all stories in a time range), however it does not
> really work so. For example, this queryhttp://services.digg.com/2.0/search.search?max_date=1291139686&min_da...
> gets results totally outside the time range requested for. I did not
> really analyze in detail, but even the count of 450K+ submissions it
> suggests for a 24 hr period  seems totally out of range.

Yep, this is another work around for the lack of a complete submission
history, which we're working on addressing.

> Once again, you guys are doing an awesome job. Just small changes and
> tweaks at a few places, will make huge improvements.
>
> Thanks!

Best,
Will

LtGenPanda

unread,
Dec 3, 2010, 11:19:05 AM12/3/10
to Digg API
Thanks Will, glad to see that this is going to be taken care of.

On Dec 1, 5:18 pm, Will <wlar...@digg.com> wrote:
> Hi,
>
> Replies inline.
>
> On Nov 30, 9:29 am, NewToDiggApi <tmoha...@gmail.com> wrote:
>
> > Will,
>
> > First thanks for the sudden new energy shown by you and your team
> > members in the past few days. Everything is responded in detail and
> > valid suggestions are being taken at best. Thanks again.
>
> We're glad to help!
>
> > On the issue discussed above, as you have mentioned above and also in
> > digg comments (http://digg.com/news/technology/introducing_digg_s_streaming_api_digg...
> > ) removing restrictions on history is really vital. However, just
> > removing the history restriction and thereby enabling lookups for time
> > periods X to Y alone, would not solve some critical issues, in my
> > opinion.
>
> Exposing the full history will definitely be a powerful feature. It's
> something we intend to do, but we don't have a timeframe.
>
> > I believe this is best time to again remind you and the rest of the
> > team about a major & unresolved issue with both the website and API.
>
> Thank you. :)
>
> > 1. Other than the stream, the main way to discover every single story
> > submitted is getUpcomingStories. However, Upcoming Stories is not
> > actually working so, which I thought was a bug, but lately seems as a
> > planned behavior.http://digg.com/news/technology/breaking_breaking_news/20101111191839...
> > This has not changed, here is what I see as Upcoming (recent) -http://i.imgur.com/D7eor.pngAsyou can see, the list is 40 minutes
Reply all
Reply to author
Forward
0 new messages