A few years back I made this, using the Guardian Content API:
http://www.flickr.com/photos/twhume/3407112348/
I've had an idea for an updated version that goes a bit further. It'd
involve running roughly 100,000 queries against the Content API. Would
that be feasible? If so, how could I best arrange it to cause the
Guardian least problems - are there any times of day which would be
best?
Thanks
Tom
--
You received this message because you are subscribed to the Google Groups "Guardian API Talk" group.
To post to this group, send an email to guardian...@googlegroups.com.
To unsubscribe from this group, send email to guardian-api-t...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/guardian-api-talk?hl=en-GB.
Please consider the environment before printing this email. ------------------------------------------------------------------ Visit guardian.co.uk - newspaper of the year www.guardian.co.uk www.observer.co.uk www.guardiannews.com On your mobile, visit m.guardian.co.uk or download the Guardian iPhone app www.guardian.co.uk/iphone To save up to 30% when you subscribe to the Guardian and the Observer visit www.guardian.co.uk/subscriber --------------------------------------------------------------------- This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News & Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News & Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396
Not interested in the content of the responses, just presence of
phrases.
Google groups seems to munge those URLs you posted unpleasantly, so
I'm not completely following your suggestion - would it let me search
for a phrase across 10 years, grouped by year, and reduce the load to
10k queries? Would 10k be feasible, even? (Maybe this idea is itself
just not practical)
On Nov 24, 4:59 pm, Michael Brunton-Spall <michael.brunton-
sp...@guardian.co.uk> wrote:
> Hey Tom,
>
> Yes I still talk about that demo as an awesome use of the API :)
>
> I'm curious as to why you might want to run so many queries against the API
> to do this though. Do you have a wordlist of 100,000 words or was there
> some other architectural requirement?
> If it was a case of doing the query against dates, I suspect a query
> something likehttp://explorer.content.guardianapis.com/#/search?q=fuck&format=json&...
> our refinements system would work for aggregate data which shows the
> results broken down by year.
> You could then automate the generation of queries likehttp://content.guardianapis.com/search?callback=jsonp1322152565012&fo...http://content.guardianapis.com/search?callback=jsonp1322152565012&fo...
I have a list of 10,000 phrases I'd like to run against the API, and
plot each one over 10 years - so was thinking that'd be 100k queries
in total. I'm only interested in the number of matches for each
phrase, not content; but if it'd be better for me to do 10,000
queries, each over 10 years, and work out the yearly totals myself...
I could do that.
Happy to be guided by you as to the best method, and could also run
these over a period of time or in quiet periods if that helps.
I couldn't follow the URLs you posted, and Google Groups seems to have
truncated them...
Tom
On Nov 24, 4:59 pm, Michael Brunton-Spall <michael.brunton-
sp...@guardian.co.uk> wrote:
> Hey Tom,
>
> Yes I still talk about that demo as an awesome use of the API :)
>
> I'm curious as to why you might want to run so many queries against the API
> to do this though. Do you have a wordlist of 100,000 words or was there
> some other architectural requirement?
> If it was a case of doing the query against dates, I suspect a query
> something likehttp://explorer.content.guardianapis.com/#/search?q=fuck&format=json&...
> our refinements system would work for aggregate data which shows the
> results broken down by year.
> You could then automate the generation of queries likehttp://content.guardianapis.com/search?callback=jsonp1322152565012&fo...http://content.guardianapis.com/search?callback=jsonp1322152565012&fo...