Balthazar Rouberol
unread,Mar 25, 2013, 9:06:29 AM3/25/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to scrapy...@googlegroups.com
Hi all,
I'm writing a small test spider that only scrapes 5 pages of a website and the shuts itself down.
To do so, I'm using the standard extension scrapy.contrib.closespider.CloseSpider, with CLOSESPIDER_PAGECOUNT = 5 defined in settings.py.
My spider indeed closes itself, but only after having crawled 20 pages:
{'downloader/request_bytes': 5932,
'downloader/request_count': 20,
'downloader/request_method_count/GET': 20,
'downloader/response_bytes': 693738,
'downloader/response_count': 20,
'downloader/response_status_count/200': 19,
'downloader/response_status_count/302': 1,
'finish_reason': 'closespider_pagecount',
'finish_time': datetime.datetime(2013, 3, 25, 13, 1, 27, 182406),
'item_scraped_count': 18,
'log_count/DEBUG': 44,
'log_count/INFO': 4,
'request_depth_max': 2,
'response_received_count': 19,
'scheduler/dequeued': 20,
'scheduler/dequeued/memory': 20,
'scheduler/enqueued': 75,
'scheduler/enqueued/memory': 75,
'start_time': datetime.datetime(2013, 3, 25, 13, 1, 23, 337858)}
Is this behaviour normal?
Thanks in advance
--
Balthazar