Running Multiple spiders in scrapy

1,277 views
Skip to first unread message

shiva krishna

unread,
Jun 8, 2012, 2:26:41 AM6/8/12
to scrapy...@googlegroups.com
Hi , 

  1. In scrapy for example if i had two URL's that contains different HTML. Now i want to write two individual spiders each for one and want to run both the spiders at once. In scrapy is it possible to run multiple spiders at once.

  2. In scrapy after writing multiple spiders, how can we schedule them to run for every 6 hours(May be like cron jobs)

I had no idea of above , can u suggest me how to perform the above things with an example.

Thanks in advance.

Tsouras

unread,
Jun 8, 2012, 5:45:29 AM6/8/12
to scrapy-users
I know the answer only for question 2.
In linux you can use cron to call a script every 6 hours

You can make the following simple bash script:
#!/bin/bash
cd /the/path/of/my/project
scrapy crawl mySpider 2>log.txt

Dont forget to make this script executable

Then you can change the crontab file using the following command
crontab -e

and add the following line
0 0,6,12,18 * * * /the/path/of/my/script


That means that your script will run at 00:00, 06:00, 12:00 and 18:00
every day


On Jun 8, 9:26 am, shiva krishna <shivakrsh...@gmail.com> wrote:
> Hi ,
>
>    1.
>
>    In scrapy for example if i had two URL's that contains different HTML.
>    Now i want to write two individual spiders each for one and want to run
>    both the spiders at once. In scrapy is it possible to run multiple spiders
>    at once.
>    2.

Pablo Hoffman

unread,
Jun 8, 2012, 5:26:19 PM6/8/12
to scrapy...@googlegroups.com
The answer for 1 would be:

1. you write the spiders (in the development environment of your choice)
2. you test and debug the spiders with "scrapy crawl"
3. once the spiders are ready you deploy them in a Scrapyd server
4. you use Scrapyd schedule.json API to schedule 2 spider runs, one
for each urls, using spider arguments to pass the url.

Pablo.
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/scrapy-users/-/VCwNA-tiPD0J.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to
> scrapy-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/scrapy-users?hl=en.

Enix Shen

unread,
Jun 10, 2012, 11:53:24 AM6/10/12
to scrapy...@googlegroups.com
I schedule scrapy scripts thru jenkins trigger plugin.

Besides the basic trigger plugin like linux cron job, there are many other trigger plugins you can choose.

I use URL content trigger in my project.

That's because  the content of http://mysite.com/version will change once nightly build completed.

Thanks,
ENix

shiva krishna

unread,
Jun 11, 2012, 2:45:40 AM6/11/12
to scrapy...@googlegroups.com
@Pablo: Thanks pablo, i had deployed to scrapyd and i am able to run all the spiders at once ..........
Its working fine

Tonal

unread,
Jun 21, 2012, 3:27:49 AM6/21/12
to scrapy...@googlegroups.com
Whay not use SpiderManager.find_by_request and BaseSpider.handles_request for start multyplay spyders?


9 06 2012., 4:26:19 UTC+7 Pablo Hoffman wrote:
The answer for 1 would be:

1. you write the spiders (in the development environment of your choice)
2. you test and debug the spiders with "scrapy crawl"
3. once the spiders are ready you deploy them in a Scrapyd server
4. you use Scrapyd schedule.json API to schedule 2 spider runs, one
for each urls, using spider arguments to pass the url.

Pablo.

shiva krishna

unread,
Jun 21, 2012, 3:34:20 AM6/21/12
to scrapy...@googlegroups.com
@Tonal: I dont know how to use these methods, if u provide an example on how to do this i can have a brief idea about SpiderManager.find_by_request and BaseSpider.handles_request.



Thanks in advance..........

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/N4Zb7iVUly0J.

shiva krishna

unread,
Jul 9, 2012, 3:04:35 AM7/9/12
to scrapy...@googlegroups.com
Deploying a project in to scrapyd runs succesfully , but running multiple spiders command is some thing like below ?

curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider1, spider2.spider3.....

Is this the right way to schedule multiple spiders in scrapyd...

 
On Saturday, 9 June 2012 02:56:19 UTC+5:30, Pablo Hoffman wrote:

shiva krishna

unread,
Jul 9, 2012, 5:12:07 AM7/9/12
to scrapy...@googlegroups.com
Hi pablo,

I have deployed the project in to scrapyd server and now i had multiple spiders in my project , and i can't use the following command for running all the spiders right ?
 curl http://localhost:6800/schedule.json -d project=myproject -d spider=spider2
So what to do now ? 
Whether i need to create an another target with another spider per url like below ?

[deploy:target1]
        For this   " curl http://localhost:6800/schedule.json -d project=target1 -d spider=spider1"
[deploy:target2]
        For this   " curl http://localhost:6800/schedule.json -d project=target2 -d spider=spider2"

If i done so this is same as scrapy crawl spider1 command , please provide me a way on how to  run multiple spiders in a single project using scrapyd  


Thanks in advance

Reply all
Reply to author
Forward
0 new messages