Scrapy errors out on Kickass Torrents

37 views
Skip to first unread message

Ricky Huang

unread,
Sep 18, 2015, 2:44:26 PM9/18/15
to scrapy-users
Hello all,

I am building a scraper for Kickass Torrents (kat.cr) for scrapping torrent information and etc.  I tested it via the shell interface and Scrapy keeps erring out:

>>> fetch("https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 90, in fetch
    reactor, self._schedule, request, spider)
  File "/usr/local/lib/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
TCPTimedOutError: TCP connection timed out: 60: Operation timed out.

However, I am able to browse the site via a web browser, so it's definitely not the site's fault.

Can anyone shed a light on this issue for me?


Thanks in advance.

Travis Leleu

unread,
Sep 18, 2015, 3:31:36 PM9/18/15
to scrapy-users
Most likely they are blocking your User-Agent (or possibly IP).  This is a basic anti-scraping measure, and easily avoidable by altering your scrapy UA.

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Ricky Huang

unread,
Sep 18, 2015, 7:15:02 PM9/18/15
to scrapy...@googlegroups.com
Thank you for the help.  I think you are right on kat.cr blocking my sever.  I switched to another server and I was able to crawl the site just fine.

I looked in the documentation and I think the correct way to do it is to modify “USER_AGENT” in the settings.py file to something like the following:

USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"

is that the correct way to do it?  kat is still blocking me with that in place.  Are there any other setting fields I need to add/change to modify my crawler signature?


Thanks again.


You received this message because you are subscribed to a topic in the Google Groups "scrapy-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scrapy-users...@googlegroups.com.

Travis Leleu

unread,
Sep 18, 2015, 8:28:20 PM9/18/15
to scrapy-users
You probably have an IP ban.  Make your requests from a different IP address.
Reply all
Reply to author
Forward
0 new messages