10,000 requests

44 views
Skip to first unread message

Tom Hume

unread,
Nov 24, 2011, 11:23:05 AM11/24/11
to Guardian API Talk
Hi

A few years back I made this, using the Guardian Content API:

http://www.flickr.com/photos/twhume/3407112348/

I've had an idea for an updated version that goes a bit further. It'd
involve running roughly 100,000 queries against the Content API. Would
that be feasible? If so, how could I best arrange it to cause the
Guardian least problems - are there any times of day which would be
best?

Thanks
Tom

Michael Brunton-Spall

unread,
Nov 24, 2011, 11:59:08 AM11/24/11
to guardian...@googlegroups.com
Hey Tom,

Yes I still talk about that demo as an awesome use of the API :)

I'm curious as to why you might want to run so many queries against the API to do this though.  Do you have a wordlist of 100,000 words or was there some other architectural requirement?
If it was a case of doing the query against dates, I suspect a query something like http://explorer.content.guardianapis.com/#/search?q=fuck&format=json&show-refinements=date using our refinements system would work for aggregate data which shows the results broken down by year.
You could then automate the generation of queries like 
and so on to generate 12 queries per year per word which shouldn't add up to too much.

I hope that helps.

Michael Brunton-Spall
Developer Advocate
guardian.co.uk



--
You received this message because you are subscribed to the Google Groups "Guardian API Talk" group.
To post to this group, send an email to guardian...@googlegroups.com.
To unsubscribe from this group, send email to guardian-api-t...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/guardian-api-talk?hl=en-GB.


Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper of the year

www.guardian.co.uk    www.observer.co.uk     www.guardiannews.com 

On your mobile, visit m.guardian.co.uk or download the Guardian
iPhone app www.guardian.co.uk/iphone
 
To save up to 30% when you subscribe to the Guardian and the Observer
visit www.guardian.co.uk/subscriber 
---------------------------------------------------------------------
This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.
 
Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

Tom Hume

unread,
Nov 24, 2011, 4:18:28 PM11/24/11
to Guardian API Talk
I have a list of 10,000 filthy expressions which I would like to
search for and graph individually across the last 10 years - if I can
structure that kinda query better than my simple-minded brute-force
brain thought, maybe it could be lighter? I expect a large number of
them to not feature at all, and a small number to produce enough to
make a big graph.

Not interested in the content of the responses, just presence of
phrases.

Google groups seems to munge those URLs you posted unpleasantly, so
I'm not completely following your suggestion - would it let me search
for a phrase across 10 years, grouped by year, and reduce the load to
10k queries? Would 10k be feasible, even? (Maybe this idea is itself
just not practical)

On Nov 24, 4:59 pm, Michael Brunton-Spall <michael.brunton-


sp...@guardian.co.uk> wrote:
> Hey Tom,
>
> Yes I still talk about that demo as an awesome use of the API :)
>
> I'm curious as to why you might want to run so many queries against the API
> to do this though.  Do you have a wordlist of 100,000 words or was there
> some other architectural requirement?
> If it was a case of doing the query against dates, I suspect a query

> something likehttp://explorer.content.guardianapis.com/#/search?q=fuck&format=json&...


> our refinements system would work for aggregate data which shows the
> results broken down by year.

> You could then automate the generation of queries likehttp://content.guardianapis.com/search?callback=jsonp1322152565012&fo...http://content.guardianapis.com/search?callback=jsonp1322152565012&fo...

Tom Hume

unread,
Nov 30, 2011, 9:31:31 AM11/30/11
to Guardian API Talk
Apologies Michael, I thought I'd replied to this.

I have a list of 10,000 phrases I'd like to run against the API, and
plot each one over 10 years - so was thinking that'd be 100k queries
in total. I'm only interested in the number of matches for each
phrase, not content; but if it'd be better for me to do 10,000
queries, each over 10 years, and work out the yearly totals myself...
I could do that.

Happy to be guided by you as to the best method, and could also run
these over a period of time or in quiet periods if that helps.

I couldn't follow the URLs you posted, and Google Groups seems to have
truncated them...

Tom

On Nov 24, 4:59 pm, Michael Brunton-Spall <michael.brunton-
sp...@guardian.co.uk> wrote:

> Hey Tom,
>
> Yes I still talk about that demo as an awesome use of the API :)
>
> I'm curious as to why you might want to run so many queries against the API
> to do this though.  Do you have a wordlist of 100,000 words or was there
> some other architectural requirement?
> If it was a case of doing the query against dates, I suspect a query

> something likehttp://explorer.content.guardianapis.com/#/search?q=fuck&format=json&...


> our refinements system would work for aggregate data which shows the
> results broken down by year.

> You could then automate the generation of queries likehttp://content.guardianapis.com/search?callback=jsonp1322152565012&fo...http://content.guardianapis.com/search?callback=jsonp1322152565012&fo...

Michael Brunton-Spall

unread,
Dec 2, 2011, 11:37:36 AM12/2/11
to guardian...@googlegroups.com, twh...@gmail.com
Sorry for delay, I've emailed you back directly as well to try to prevent url munging.

Yes refinements do exactly that, they give you a count of occurences broken down by year.  Unfortunately we don't break down any further, but if you want to check each of the last 10 years (giving you only 10 datapoints per filthy word) you could simply make one call per word, saving you 90% of your calls.


With an API key you are restricted to about 15,000 queries per day, so you could run the whole lot in one day (I think we also restrict on queries per second, but at a fairly high number for API key based calls, so you can probably make 5 calls a second fairly easily say.).
These sorts of calls are fairly cheap for our system to process so there isn't any need to let us know, we make significantly more calls than that a day from our internal systems.

Hope that helps.

Michael Brunton-Spall
Developer Advocate
guardian.co.uk


Tom Hume

unread,
Dec 10, 2011, 9:46:33 AM12/10/11
to Michael Brunton-Spall, guardian...@googlegroups.com
Thanks Michael.

The JSON returned by your example URL seems to go back 5 years; how would I make it give me 10 years-worth of refinement groups?

Tom
Reply all
Reply to author
Forward
0 new messages