how to give scrapy a url list to replace the start_urls

2,838 views
Skip to first unread message

kom

unread,
Jul 1, 2010, 7:36:53 AM7/1/10
to scrapy-users
hi all
My url is not confirm,I need get the url list from another python
app,so ,how to use the new list assign to start_urls.

thanks.

Pablo Hoffman

unread,
Jul 1, 2010, 10:24:14 AM7/1/10
to scrapy...@googlegroups.com
It depends on how you're running your spider.

If you're constructing the spider somewhere you could pass it the start_urls in
the constructor:

spider = MySpider(start_urls=THE_URLS)

Otherwise you could make the spider load the URLs itself in the constructor:

def MySpider(object):

def __init__(self):
self.start_urls = THE_URLS

# the other spider methods


Pablo.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Mingchin Hsieh

unread,
Jul 1, 2010, 10:23:46 AM7/1/10
to scrapy...@googlegroups.com
I would suggest you take a look [scrapy install
directory]/tests/test_engine.py, line 83 - 88.

~Z

kom

unread,
Jul 2, 2010, 2:28:36 AM7/2/10
to scrapy-users
Thanks a lot,I will have a look.
but it seems a little complex to me.
I want to start a scrapy with this command:python scrapy-ctl.py crawl
example.com
is there another way to do it?
> > For more options, visit this group athttp://groups.google.com/group/scrapy-users?hl=en.- Hide quoted text -
>
> - Show quoted text -

Wilmer

unread,
Jul 2, 2010, 4:10:06 AM7/2/10
to scrapy-users

> I want to start a scrapy with this command:python scrapy-ctl.py crawl
> example.com
> is there another way to do it?

What I did was to create a new setting in the settings.py file call
START_URLS.
Alternatively, you can set it in the command line, using "scrapy-
ctl.py settings --set <value>" command in the latest scrapy v0.9.

Then for the start_urls (in the spider), just read from the settings:

start_urls = settings.getlist("START_URLS")

don't forget to import "scrapy.conf.settings"


Wilmer

kom

unread,
Jul 2, 2010, 9:58:08 AM7/2/10
to scrapy-users
Thank a lot,that's great.

let ask more.
how to start a spider except use the command:python scrapy-ctl.py
crawl example.com

kom

unread,
Jul 3, 2010, 9:28:28 AM7/3/10
to scrapy-users
ok,I find it.
like this:

spider = MySpider()
scrapymanager.configure()
scrapymanager.runonce(spider)
> > Wilmer- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages