How to avoid security question? 429 even in Scrapy Shell for single page

523 views
Skip to first unread message

enric...@gmail.com

unread,
Apr 24, 2016, 9:54:49 AM4/24/16
to scrapy-users
Hey everyone,

having issues scraping a site which keeps on giving me a 429 response. After checking the site in scrapy shell and looking at the response I noticed that even if I fetch only the first page in scrapy shell, I'm presented with a security question a la "oops this looks suspicious prove that your are not a robot". 
I have changed the User-Agent, I'm using ProxyMesh, have disabled cookies and tried multiple download delays. Now I'm out of ideas. In my browser everything is as it should be. 

Something de-masks my bot as a bot. 

Any ideas of how to tackle this?

lnxpgn lnxpgn

unread,
Apr 24, 2016, 11:05:58 AM4/24/16
to scrapy...@googlegroups.com
Hi,

I think the problem might be your Scrapy settings because your browser with ProxyMesh is ok.

You might use Scrapy to fetch the URL http://www.xhaus.com/headers or other similar, and then make sure the HTTP request headers are what you want.

You can also disable ProxyMesh first, capture network packets on your machine using tcpdump or WireShark or other tools, check the HTTP request and response headers.
--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages