Hi there,
I am trying to reproduce the tutorial. Things seem fine, but I run into a problem in the data extraction step. I do:
scrapy shell '
http://www.zeit.de/index'
This gives me an error here:
2016-10-19 11:36:30 [scrapy] INFO: Enabled item pipelines:
[]
2016-10-19 11:36:30 [scrapy] DEBUG: Telnet console listening on
127.0.0.1:60232016-10-19 11:36:30 [scrapy] INFO: Spider opened
2016-10-19 11:36:30 [scrapy] DEBUG: Retrying <GET http://'http:/robots.txt> (failed 1 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] DEBUG: Retrying <GET http://'http:/robots.txt> (failed 2 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] DEBUG: Gave up retrying <GET http://'http:/robots.txt> (failed 3 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] ERROR: Error downloading <GET http://'http:/robots.txt>: DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
DNSLookupError: DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] DEBUG: Retrying <GET http://'
http://www.zeit.de/index'> (failed 1 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] DEBUG: Retrying <GET http://'
http://www.zeit.de/index'> (failed 2 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
2016-10-19 11:36:30 [scrapy] DEBUG: Gave up retrying <GET http://'
http://www.zeit.de/index'> (failed 3 times): DNS lookup failed: address "'http:" not found: [Errno 11001] getaddrinfo failed.
And then what follows is a Python exception. I also tried with the tutorial URLs, the result was the same.
I actually think the problem is here:
2016-10-19 11:36:30 [scrapy] DEBUG: Retrying <GET http://'http:/robots.txt>
This URL is obviously wrong. Any hints on how this can be fixed?
Cheers,
Anne