Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Scrapy + ProxyMesh to crawl Google News?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
naaboo  
View profile  
 More options Nov 8 2012, 5:38 am
From: naaboo <malte.spielber...@gmail.com>
Date: Thu, 8 Nov 2012 02:38:55 -0800 (PST)
Local: Thurs, Nov 8 2012 5:38 am
Subject: Scrapy + ProxyMesh to crawl Google News?

Heya

I need to scrape Google News Items and therefore a need for some Proxies
and rotating proxies has risen.

I've played around with rotating proxie script (kindly provided
here: http://mahmoud.abdel-fattah.net/2012/04/16/using-scrapy-with-differen...)
and used 150 proxies from hidemyass.com

But Google blocked me as soon as I made a request.

So I tried using ProxyMesh, but the same thing happens :(

When I check with whatsmyip.org, I get a new IP every request (so for me
this means, the proxy middleware is cofigured correctly)

Do you have any tips for me to solve this problem?

THANKS!

ps: I'm running a Ubuntu CLI-only EC2


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
naaboo  
View profile  
 More options Nov 9 2012, 2:42 pm
From: naaboo <malte.spielber...@gmail.com>
Date: Fri, 9 Nov 2012 11:42:15 -0800 (PST)
Local: Fri, Nov 9 2012 2:42 pm
Subject: Re: Scrapy + ProxyMesh to crawl Google News?

I got an answer from one of the ProxyMesh guys:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ray  
View profile  
 More options Nov 14 2012, 9:54 am
From: Ray <max.che...@gmail.com>
Date: Wed, 14 Nov 2012 06:54:17 -0800 (PST)
Local: Wed, Nov 14 2012 9:54 am
Subject: Re: Scrapy + ProxyMesh to crawl Google News?

Hi Naaboo,

I am doing a Scrapy to crawl information from website also.
I am new, I tried to scrapy information from below website and can insert
the title, link time to MySQL db, I will use Django to show them.
Could you teach me how to crawl a news site ?
Thanks, my email ophcra...@yahoo.com

'158', 'Top', '/', '2012-11-14 22:18:58', '\n                '
'159', 'Computers: Programming: Resources',
'/Computers/Programming/Resources/', '2012-11-14 22:18:58',
'\n                        '
'160', 'Free Python and Zope Hosting Directory',
'http://www.oinko.net/freepython/', '2012-11-14 22:18:58', '\n            
\n                    '
'161', 'Social Bug', 'http://win32com.goermezer.de/', '2012-11-14
22:18:58', '\n            \n                    '
'162', 'Computers: Programming: Languages: Python: Resources',
'/Computers/Programming/Languages/Python/Resources/', '2012-11-14
22:18:59', '\n                        '


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
naaboo  
View profile  
 More options Nov 19 2012, 2:00 pm
From: naaboo <malte.spielber...@gmail.com>
Date: Mon, 19 Nov 2012 11:00:34 -0800 (PST)
Local: Mon, Nov 19 2012 2:00 pm
Subject: Re: Scrapy + ProxyMesh to crawl Google News?

Hi Ray,

sorry for my late reply.

What exactly is your problem?
The data you show there seem to be fine?

I am not sure as to how I can help you out

Best
Naaboo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »