I wrote the following spider, which is mostly just copy and pasted
from the tutorial, replacing dmoz.org with tennnis.wettpoint.com:
from scrapy.spider import BaseSpider
class player_spider(BaseSpider):
name = "tennis.wettpoint.com"
allowed_domains = ["http://tennis.wettpoint.com/en/"]
start_urls = ["http://tennis.wettpoint.com/en/"]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
When I run this at the command line using:
scrapy crawl tennis.wettpoint.com
I get the error below. I can't find anything on "200 None" in the
documentation. Any ideas?
Thanks,
Braxton
[atpData]$ scrapy crawl tennis.wettpoint.com
2011-09-13 13:28:24-0700 [scrapy] INFO: Scrapy 0.12.0.2548 started
(bot: atpData)
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled extensions:
TelnetConsole, SpiderContext, WebService, CoreStats, CloseSpider
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled scheduler
middlewares: DuplicatesFilterMiddleware
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled downloader
middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,
UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware,
RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware,
DownloaderStats
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled item pipelines:
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Telnet console listening on
0.0.0.0:6023
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-13 13:28:24-0700 [tennis.wettpoint.com] INFO: Spider opened
2011-09-13 13:28:24-0700 [tennis.wettpoint.com] DEBUG: Retrying <GET
http://tennis.wettpoint.com/> (failed 1 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] DEBUG: Retrying <GET
http://tennis.wettpoint.com/> (failed 2 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] DEBUG: Discarding <GET
http://tennis.wettpoint.com/> (failed 3 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] ERROR: Error
downloading <http://tennis.wettpoint.com/>: [Failure instance:
Traceback (failure with no frames): <class
'twisted.web.client.PartialDownloadError'>: 200 None
]
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] INFO: Closing spider (finished)
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] INFO: Spider closed (finished)
Thanks for reporting this. It was easier to reproduce by just running
the shell with that url:
scrapy shell http://tennis.wettpoint.com/en/
I've fixed the issue here:
https://github.com/scrapy/scrapy/commit/f4821a123d17e90686ea0b1eb9447dedcb604431
Pablo.
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.