Hi Pablo,
The DOWNLOAD_DELAY seems to work now, like expected.
But I'm a little confused, though. I have the following
DownloaderMiddleware:
from scrapy import log
class LogMiddleware(object):
def process_request(self, request, spider):
log.msg('Sending request: %s' % request, log.INFO)
def process_response(self, request, response, spider):
log.msg('Receiving response: %s' % response, log.INFO)
return response
And
it's middleware priority is 1000 (i.e. it is the last one to be
executed). When I start crawling (supposing there are many requests from
start_requests()) with:
CONCURRENT_REQUESTS = 5
CONCURRENT_REQUESTS_PER_DOMAIN = 2
DOWNLOAD_DELAY=10
I
instantly receive 5 "Sending request:..." messages. They are probably
queued, because the responses are coming in 10 second intervals. But
after reading
http://doc.scrapy.org/en/0.16/topics/architecture.html, either the requests are queued in Downloader component or the flow is not exactly as depicted. Which one is correct?
Anyway, another small bugfix:- in scrapy/core/scraper.py, line 215:
Replace:
log.err(output, 'Error processing %(item)s', item=item, spider=spider)
With:
log.err(output, 'Error processing %s' % item, spider=spider)
- I would also suggest for DjangoItem's
save() add argument
validate and run
full_clean(), for the model, if the
validate is
True- and for command "shell" add option "--spider" to select, which spider to use in shell
I have these things implemented in my local repository, so probably I can contribute?
Regards,
Mimino