Scrapy + ProxyMesh to crawl Google News?
The group you are posting to is a
Usenet group . Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
From:
naaboo <malte.spielber... @gmail.com>
Date: Thu, 8 Nov 2012 02:38:55 -0800 (PST)
Local: Thurs, Nov 8 2012 5:38 am
Subject: Scrapy + ProxyMesh to crawl Google News?
Heya
I need to scrape Google News Items and therefore a need for some Proxies and rotating proxies has risen.
I've played around with rotating proxie script (kindly provided here: http://mahmoud.abdel-fattah.net/2012/04/16/using-scrapy-with-differen... ) and used 150 proxies from hidemyass.com
But Google blocked me as soon as I made a request.
So I tried using ProxyMesh, but the same thing happens :(
When I check with whatsmyip.org, I get a new IP every request (so for me this means, the proxy middleware is cofigured correctly)
Do you have any tips for me to solve this problem?
THANKS!
ps: I'm running a Ubuntu CLI-only EC2
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
naaboo <malte.spielber... @gmail.com>
Date: Fri, 9 Nov 2012 11:42:15 -0800 (PST)
Local: Fri, Nov 9 2012 2:42 pm
Subject: Re: Scrapy + ProxyMesh to crawl Google News?
I got an answer from one of the ProxyMesh guys:
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
Ray <max.che... @gmail.com>
Date: Wed, 14 Nov 2012 06:54:17 -0800 (PST)
Local: Wed, Nov 14 2012 9:54 am
Subject: Re: Scrapy + ProxyMesh to crawl Google News?
Hi Naaboo,
I am doing a Scrapy to crawl information from website also. I am new, I tried to scrapy information from below website and can insert the title, link time to MySQL db, I will use Django to show them. Could you teach me how to crawl a news site ? Thanks, my email ophcra... @yahoo.com
'158', 'Top', '/', '2012-11-14 22:18:58', '\n ' '159', 'Computers: Programming: Resources', '/Computers/Programming/Resources/', '2012-11-14 22:18:58', '\n ' '160', 'Free Python and Zope Hosting Directory', 'http://www.oinko.net/freepython/' , '2012-11-14 22:18:58', '\n \n ' '161', 'Social Bug', 'http://win32com.goermezer.de/' , '2012-11-14 22:18:58', '\n \n ' '162', 'Computers: Programming: Languages: Python: Resources', '/Computers/Programming/Languages/Python/Resources/', '2012-11-14 22:18:59', '\n '
You must
Sign in before you can post messages.
You do not have the permission required to post.
From:
naaboo <malte.spielber... @gmail.com>
Date: Mon, 19 Nov 2012 11:00:34 -0800 (PST)
Local: Mon, Nov 19 2012 2:00 pm
Subject: Re: Scrapy + ProxyMesh to crawl Google News?
Hi Ray,
sorry for my late reply.
What exactly is your problem? The data you show there seem to be fine?
I am not sure as to how I can help you out
Best Naaboo
On Wednesday, November 14, 2012 3:54:18 PM UTC+1, Ray wrote:
> Hi Naaboo,
> I am doing a Scrapy to crawl information from website also. > I am new, I tried to scrapy information from below website and can insert > the title, link time to MySQL db, I will use Django to show them. > Could you teach me how to crawl a news site ? > Thanks, my email ophc... @yahoo.com <javascript:>
> '158', 'Top', '/', '2012-11-14 22:18:58', '\n ' > '159', 'Computers: Programming: Resources', > '/Computers/Programming/Resources/', '2012-11-14 22:18:58', > '\n ' > '160', 'Free Python and Zope Hosting Directory', ' > http://www.oinko.net/freepython/' , '2012-11-14 22:18:58', '\n > \n ' > '161', 'Social Bug', 'http://win32com.goermezer.de/' , '2012-11-14 > 22:18:58', '\n \n ' > '162', 'Computers: Programming: Languages: Python: Resources', > '/Computers/Programming/Languages/Python/Resources/', '2012-11-14 > 22:18:59', '\n '
> On Saturday, November 10, 2012 3:42:15 AM UTC+8, naaboo wrote:
>> I got an answer from one of the ProxyMesh guys:
>>> Unfortunately, Google is a popular site to scrape, and ProxyMesh has >>> been blocked by them. So for scraping Google I always recommend >>> http://www.trustedproxies.com/ <http://email.proxymesh.com/c/aD01ZWVhZTk3MjM2MmNhMTY4MDZmZDgwZTEzZjMz... >
>>> Jacob
You must
Sign in before you can post messages.
You do not have the permission required to post.