Automatic crawl google for search results

478 views
Skip to first unread message

Sarabjeet

unread,
Jun 15, 2008, 8:48:25 PM6/15/08
to Google Data Protocol

Hi,

I am working on a project where I need to crawl google search to
download the first 50 results for a hundred queries and store them on
my computer. I do not need a webpage or something. What is the best
API for me to use because I found out that google does not allow
client programs to crawl their websites directly. Also, I am not at
all familiar with Javascipt but I know Python. The Python gdata
library, unfortunately, does not have the search APIs.

Kindly reply asap.

thanks
Sarabjeet

Jeff Fisher (Google)

unread,
Jun 16, 2008, 12:36:37 AM6/16/08
to Google Data Protocol
Hi Sarabjeet,

The only web search API we have is the AJAX Search API. There is a way
to use it in non-JavaScript environments, however:

http://code.google.com/apis/ajaxsearch/documentation/#fonje

Since it's returning a JSON object you probably want to look at
json.py:

http://sourceforge.net/projects/json-py/

Cheers,
-Jeff

Ray Baxter

unread,
Jun 16, 2008, 12:40:12 AM6/16/08
to google-he...@googlegroups.com
There is a restful interface to Google search: http://code.google.com/apis/ajaxsearch/documentation/reference.html#_intro_fonje

You will need a valid referrer page, perhaps something describing your project and how to contact you. There is some limitation on the number of results that can be obtained, but I don't see it documented. In my brief testing, the current limitation is 100 results, ie. http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Google&start=24 returns results, buthttp://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Google&start=25 doesn't.

The results are json. The query ignores any alt=xml parameter.


Ray

yong liu

unread,
Jun 16, 2008, 1:54:55 AM6/16/08
to google-he...@googlegroups.com
I make an Application project with vb.net,now I need to connect Google contact,can get message about contact of user.How can I do.
 
Please teach me as soon as possible,and the better teache me step by step.
 
Thank you.

Sarabjeet

unread,
Jun 17, 2008, 4:24:31 PM6/17/08
to Google Data Protocol
Hi,

Thanks for your help.
I tried using wget as explained in http://code.google.com/apis/ajaxsearch/documentation/.
But there is some issue with wget coz it gives an error : Unsupported
Scheme.
If you have an idea about wget, is SSL is required for this purpose?
I have looked in many forums for wget but haven't been able to solve
this issue.

thanks
Sarabjeet

On Jun 16, 12:40 am, Ray Baxter <ray.bax...@gmail.com> wrote:
> On Jun 15, 2008, at 5:48 PM, Sarabjeet wrote:
>
>
>
> > I am working on a project where I need to crawl google search to
> > download the first 50 results for a hundred queries and store them on
> > my computer. I do not need a webpage or something. What is the best
> > API for me to use because I found out that google does not allow
> > client programs to crawl their websites directly. Also, I am not at
> > all familiar with Javascipt but I know Python. The Python gdata
> > library, unfortunately, does not have the search APIs.
>
> There is a restful interface to Google search:http://code.google.com/apis/ajaxsearch/documentation/reference.html#_...
>
> You will need a valid referrer page, perhaps something describing your
> project and how to contact you. There is some limitation on the number
> of results that can be obtained, but I don't see it documented. In my
> brief testing, the current limitation is 100 results, ie.http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Google&st...
> returns results, buthttp://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Google&st...

Ray Baxter

unread,
Jun 17, 2008, 6:39:37 PM6/17/08
to google-he...@googlegroups.com

On Jun 17, 2008, at 1:24 PM, Sarabjeet wrote:

>
> Hi,
>
> Thanks for your help.
> I tried using wget as explained in http://code.google.com/apis/ajaxsearch/documentation/
> .
> But there is some issue with wget coz it gives an error : Unsupported
> Scheme.
> If you have an idea about wget, is SSL is required for this purpose?
> I have looked in many forums for wget but haven't been able to solve
> this issue.

SSL is not required. This works (remove the spaces):

wget "http ://ajax.googleapis.com/ajax/services/search/web?
v=1.0&q=Google "


The Unsupported scheme error implies that you are trying to do
something like

wget "htpp://ajax.googleapis. ...."

Cut and paste exactly what you were doing, and the error message that
you recieve. Perhaps that will offer some clue.

Ray

Reply all
Reply to author
Forward
0 new messages