Run scrapy from the script

126 views
Skip to first unread message

ivanb

unread,
Apr 3, 2012, 4:05:39 AM4/3/12
to scrapy-users
I made couple of scrapers, some basic stuff, and I'm looking now
into creating some UI for those scrapers.
So they can be run from that UI by just clicking. So, I would like if
you could give me some guidelines where to look at, and if there is
some example of something similar. I guess the first and main step
would be running scrapy from the script, and then rest of it wouldn't
be problem.

I'm looking for some advice in what way to accomplish running
scraper from the script, I guess someone surely did something like
that before, so I can see what can be done with that. Please share
some advice for this.

Thanks

ivanb

unread,
Apr 3, 2012, 6:02:41 AM4/3/12
to scrapy-users
I've tried this http://snippets.scrapy.org/snippets/13/#c14

But when I put it into my spider it first gives me an error

pickle.PicklingError: Can't pickle <function remove at 0x019FC1F0>:
it's not found as weakref.remove

, and afterwards it seems that it starts my spider but no longer can
see my path.

Like it says ImportError: No module named scraper.items, and before it
worked properly.

Any ideas?

ivanb

unread,
Apr 11, 2012, 9:20:32 AM4/11/12
to scrapy-users
Does anybody has some experience with this? Did someone get it to
start from the script?

Steven Almeroth

unread,
Apr 16, 2012, 12:55:09 PM4/16/12
to scrapy...@googlegroups.com
Have you tried http://snippets.scrapy.org/snippets/7/

Self-contained script to crawl a site [updated: scrapy 13.0dev] 

Jozef Celuch

unread,
Apr 18, 2012, 2:15:46 PM4/18/12
to scrapy...@googlegroups.com
As far as I know this is not supported anymore. You're probably gonna have to use scrapyd for this kind of thing. http://doc.scrapy.org/en/latest/topics/scrapyd.html

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/hc8G8FKk-jMJ.

To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.



--
JC

Michael Stone

unread,
Dec 10, 2016, 6:15:22 PM12/10/16
to scrapy-users
Hi Ivanb,
I'm facing the same problem.

I'm trying to use Tornado as the web serivce, providing a page with a submit area, and pass the url users entered to scrapy, then run my spider, load the items to MongoDB. At the same time, the web page will keep getting the items from the MongoDB so users won't have to wait until the spider's done.

Googled a lot, and I'm new to Python.
Having a hard time.

Hope you would solve this ahead of me, and desperate for a solution.

Thanks. 

Adam Morris

unread,
Dec 19, 2016, 4:19:06 AM12/19/16
to scrapy-users
I use this bit of code to run a spider programmatically.

def run_scraper(path_to_project, **kwargs):
    """ run a spider defined in a scrapy project """
    os.chdir(path_to_project)
    settings = get_project_settings()
    process = CrawlerProcess(settings)
    process.crawl(spider, **kwargs)  # TODO: add kwargs here for options
    process.start()


Reply all
Reply to author
Forward
0 new messages