Scrapy errors out on Kickass Torrents

Ricky Huang

unread,

Sep 18, 2015, 2:44:26 PM9/18/15

to scrapy-users

Hello all,

I am building a scraper for Kickass Torrents (kat.cr) for scrapping torrent information and etc. I tested it via the shell interface and Scrapy keeps erring out:

>>> fetch("https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html")
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 90, in fetch
reactor, self._schedule, request, spider)
File "/usr/local/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
TCPTimedOutError: TCP connection timed out: 60: Operation timed out.

However, I am able to browse the site via a web browser, so it's definitely not the site's fault.

Can anyone shed a light on this issue for me?

Thanks in advance.

Travis Leleu

unread,

Sep 18, 2015, 3:31:36 PM9/18/15

to scrapy-users

Most likely they are blocking your User-Agent (or possibly IP). This is a basic anti-scraping measure, and easily avoidable by altering your scrapy UA.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Ricky Huang

unread,

Sep 18, 2015, 7:15:02 PM9/18/15

to scrapy...@googlegroups.com

Thank you for the help. I think you are right on kat.cr blocking my sever. I switched to another server and I was able to crawl the site just fine.

I looked in the documentation and I think the correct way to do it is to modify “USER_AGENT” in the settings.py file to something like the following:

USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"

is that the correct way to do it? kat is still blocking me with that in place. Are there any other setting fields I need to add/change to modify my crawler signature?

Thanks again.

You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.

Travis Leleu

unread,

Sep 18, 2015, 8:28:20 PM9/18/15

to scrapy-users

You probably have an IP ban. Make your requests from a different IP address.

Reply all

Reply to author

Forward