"200 None" Error

157 views
Skip to first unread message

Braxton Osting

unread,
Sep 13, 2011, 4:44:11 PM9/13/11
to scrapy...@googlegroups.com
Hi, I'm new to scrapy and am having difficulty scraping information
from a particular website:
tennis.wettpoint.com/en

I wrote the following spider, which is mostly just copy and pasted
from the tutorial, replacing dmoz.org with tennnis.wettpoint.com:


from scrapy.spider import BaseSpider

class player_spider(BaseSpider):

name = "tennis.wettpoint.com"
allowed_domains = ["http://tennis.wettpoint.com/en/"]
start_urls = ["http://tennis.wettpoint.com/en/"]

def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)

When I run this at the command line using:
scrapy crawl tennis.wettpoint.com

I get the error below. I can't find anything on "200 None" in the
documentation. Any ideas?

Thanks,
Braxton

[atpData]$ scrapy crawl tennis.wettpoint.com
2011-09-13 13:28:24-0700 [scrapy] INFO: Scrapy 0.12.0.2548 started
(bot: atpData)
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled extensions:
TelnetConsole, SpiderContext, WebService, CoreStats, CloseSpider
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled scheduler
middlewares: DuplicatesFilterMiddleware
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled downloader
middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,
UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware,
RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware,
DownloaderStats
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Enabled item pipelines:
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Telnet console listening on
0.0.0.0:6023
2011-09-13 13:28:24-0700 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-13 13:28:24-0700 [tennis.wettpoint.com] INFO: Spider opened
2011-09-13 13:28:24-0700 [tennis.wettpoint.com] DEBUG: Retrying <GET
http://tennis.wettpoint.com/> (failed 1 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] DEBUG: Retrying <GET
http://tennis.wettpoint.com/> (failed 2 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] DEBUG: Discarding <GET
http://tennis.wettpoint.com/> (failed 3 times): 200 None
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] ERROR: Error
downloading <http://tennis.wettpoint.com/>: [Failure instance:
Traceback (failure with no frames): <class
'twisted.web.client.PartialDownloadError'>: 200 None
]
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] INFO: Closing spider (finished)
2011-09-13 13:28:25-0700 [tennis.wettpoint.com] INFO: Spider closed (finished)

Pablo Hoffman

unread,
Oct 25, 2011, 11:05:45 AM10/25/11
to scrapy...@googlegroups.com
Hi Braxton,

Thanks for reporting this. It was easier to reproduce by just running
the shell with that url:

scrapy shell http://tennis.wettpoint.com/en/

I've fixed the issue here:
https://github.com/scrapy/scrapy/commit/f4821a123d17e90686ea0b1eb9447dedcb604431

Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Reply all
Reply to author
Forward
0 new messages