Random DOWNLOAD_DELAY in autothrottle extension is not working

771 views
Skip to first unread message

tim feirg

unread,
Jul 30, 2014, 11:58:03 AM7/30/14
to scrapy...@googlegroups.com
I'm currently using scrapy 0.25 under an ubuntu server, I've add these lines to my settings.py:

CONCURRENT_REQUESTS_PER_DOMAIN = 3
RETRY_TIMES = 10
DOWNLOAD_DELAY = 6
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_DEBUG = True
RANDOMIZE_DOWNLOAD_DELAY = True

But the output (after several days of crawling) shows that the DOWNLOAD_DELAY is always 6 seconds, this isn't deadly but it adds uncertainty to this great framework, can somebody tells me what's wrong?

Rolando Espinoza La Fuente

unread,
Jul 30, 2014, 7:00:57 PM7/30/14
to scrapy...@googlegroups.com
From the docs:

   The AutoThrottle extension honours the standard Scrapy settings for
   concurrency and delay. This means that it will never set a download delay
   lower than :setting:`DOWNLOAD_DELAY` or a concurrency higher than
   :setting:`CONCURRENT_REQUESTS_PER_DOMAIN`
   (or :setting:`CONCURRENT_REQUESTS_PER_IP`, depending on which one you use).

Regards,
Rolando


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

tim feirg

unread,
Jul 31, 2014, 3:58:49 AM7/31/14
to scrapy...@googlegroups.com
I'm not explaining my problem good enough, sorry.
What I'm saying is that DOWNLAOD_DELAY should be a random number according to the docs:

This setting is also affected by the RANDOMIZE_DOWNLOAD_DELAY setting (which is enabled by default). By default, Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 and 1.5 * DOWNLOAD_DELAY.
But in the output the delay is always 6 secs. 


在 2014年7月31日星期四UTC+8上午7时00分57秒,Rolando Espinoza La fuente写道:

Daniel Graña

unread,
Aug 11, 2014, 9:53:39 AM8/11/14
to scrapy...@googlegroups.com
hi Tim Feird,

The fixed delay is logged by autothrottle extension, but the actual delay is randomized while processing the download queue at s.c.donwloader.Downloader#132. So, in summary, even if the message says the delays is 6s the actual delay is in 3s-9s range.

thanks
Daniel 
Reply all
Reply to author
Forward
0 new messages