pause/resume/stop spider, not entire engine

1,173 views
Skip to first unread message

vitsin

unread,
Jul 4, 2011, 2:35:36 PM7/4/11
to scrapy-users
hi,
is there any way to pause/resume/stop particular spider, not entire
Scrapy engine?
thanks,
--vs

vitsin

unread,
Jul 5, 2011, 12:18:56 PM7/5/11
to scrapy-users
Scrapy developers, would you guys consider adding such feature please?
--vs

Pablo Hoffman

unread,
Jul 12, 2011, 9:01:14 AM7/12/11
to scrapy...@googlegroups.com
We're not only considering it, but also working on it.

There are currently two working patches in my MQ that add this functionality in
case anyone wants to try an early preview (they need to be applied in order):

http://hg.scrapy.org/users/pablo/mq/file/tip/scheduler_single_spider.patch
http://hg.scrapy.org/users/pablo/mq/file/tip/persistent_scheduler.patch

To run a spider as before (no persistence):

scrapy crawl thespider

To run a spider storing scheduler+dupefilter state in a dir:

scrapy crawl thespider --set SCHEDULER_DIR=run1

During the crawl, you can hit ^C to cancel the crawl and resume it later with:

scrapy crawl thespider --set SCHEDULER_DIR=run1


The SCHEDULER_DIR setting name is bound to change before the final release, but
the idea will be the same - that you pass a directory where to persist the
state.

Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Message has been deleted

massabuntu

unread,
Jul 19, 2011, 3:57:10 PM7/19/11
to scrapy...@googlegroups.com
Hi,
against which version of scrapy i have to patch?

Pablo Hoffman

unread,
Jul 19, 2011, 6:14:23 PM7/19/11
to scrapy...@googlegroups.com
These two patches:

http://hg.scrapy.org/users/pablo/mq/file/tip/scheduler_single_spider.patch
http://hg.scrapy.org/users/pablo/mq/file/tip/persistent_scheduler.patch

Apply cleanly on trunk right now (in that order).

On Tue, Jul 19, 2011 at 12:57:10PM -0700, massabuntu wrote:
> Hi,
> against which version of scrapy i have to patch?
>

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/Pthh5nVZ4kwJ.

Martino Massalini

unread,
Jul 19, 2011, 6:33:53 PM7/19/11
to scrapy...@googlegroups.com
Sorry but i'm in a no-confort zone here with patches.

am i right with this?

1) clone the code from the last mercurial (hg clone http://hg.scrapy.org/scrapy)
2 ) Appy the patch in that order (Can you type the command?)
3) Build the source (python setup.py install)


Pablo Hoffman

unread,
Jul 20, 2011, 12:21:54 PM7/20/11
to scrapy...@googlegroups.com
Yes, something like that:

hg clone http://hg.scrapy.org/scrapy
wget http://hg.scrapy.org/users/pablo/mq/raw-file/b926f44f1aaf/scheduler_single_spider.patch
wget http://hg.scrapy.org/users/pablo/mq/raw-file/b926f44f1aaf/persistent_scheduler.patch
patch -p1 < scheduler_single_spider.patch
patch -p1 < persistent_scheduler.patch
python setup.py install

I wouldn't install scrapy system-wide (with setup.py install) but instead point
the python path to the directory where you cloned it, so that it finds it.

Also, the patches are still on development and will change, so you may need to
repeat these steps in the future.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/wDHmnXNqI70J.

Martino Massalini

unread,
Jul 21, 2011, 12:15:42 PM7/21/11
to scrapy...@googlegroups.com
Is this ok?

root@cherryvps:~/Development/TarBalls# patch -p1 < scheduler_single_spider.patch
patching file scrapy/contrib/dupefilter.py
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 17.
2 out of 2 hunks FAILED -- saving rejects to file scrapy/contrib/dupefilter.py.rej
patching file scrapy/core/engine.py
Hunk #1 FAILED at 52.
Hunk #2 FAILED at 81.
Hunk #3 FAILED at 126.
Hunk #4 FAILED at 157.
Hunk #5 FAILED at 182.
Hunk #6 FAILED at 225.
Hunk #7 FAILED at 258.
Hunk #8 FAILED at 268.
8 out of 8 hunks FAILED -- saving rejects to file scrapy/core/engine.py.rej
patching file scrapy/core/scheduler.py
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file scrapy/core/scheduler.py.rej


And that??


root@cherryvps:~/Development/TarBalls# patch -p1 < persistent_scheduler.patch
patching file docs/faq.rst
Hunk #1 FAILED at 87.
1 out of 1 hunk FAILED -- saving rejects to file docs/faq.rst.rej
patching file docs/topics/settings.rst
Hunk #1 FAILED at 444.
Hunk #2 FAILED at 804.
2 out of 2 hunks FAILED -- saving rejects to file docs/topics/settings.rst.rej
The next patch would delete the file scrapy/contrib/dupefilter.py,
which does not exist!  Assume -R? [n] n
Apply anyway? [n] n
Skipping patch.
1 out of 1 hunk ignored
patching file scrapy/contrib/spiders/crawl.py
Hunk #1 FAILED at 6.
Hunk #2 FAILED at 38.
Hunk #3 FAILED at 48.
3 out of 3 hunks FAILED -- saving rejects to file scrapy/contrib/spiders/crawl.py.rej
patching file scrapy/core/scheduler.py
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file scrapy/core/scheduler.py.rej
patching file scrapy/dupefilter.py
patching file scrapy/settings/default_settings.py
Hunk #1 FAILED at 81.
Hunk #2 FAILED at 217.
2 out of 2 hunks FAILED -- saving rejects to file scrapy/settings/default_settings.py.rej
patching file scrapy/tests/test_dupefilter.py
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file scrapy/tests/test_dupefilter.py.rej
patching file scrapy/tests/test_utils_queue.py
patching file scrapy/tests/test_utils_reqser.py
patching file scrapy/utils/queue.py
patching file scrapy/utils/reqser.py

Thank you very much!

Reply all
Reply to author
Forward
0 new messages