Where is the item returned by the spider?

20 views
Skip to first unread message

Zeynel

unread,
Nov 20, 2009, 11:39:04 AM11/20/09
to scrapy-users
When I run the spider in the command prompt with

python scrapy-ctl.py crawl whitecase.com

I see the FirmItem in the return items:

[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP -
Lawyers - Rachel B. Wagner '])

but where is this item? How do I get it and put it into a file?

Thank you for the help.

Pablo Hoffman

unread,
Nov 20, 2009, 12:52:18 PM11/20/09
to scrapy...@googlegroups.com
By writing an item pipeline to do that:
http://doc.scrapy.org/topics/item-pipeline.html

You can use the builtin File Export Pipeline for that:
http://doc.scrapy.org/topics/item-pipeline.html#module-scrapy.contrib.pipeline.fileexport

Which uses the Item Exporters:
http://doc.scrapy.org/topics/exporters.html

Or you can implement your own item pipeline, either using the Item Exporters or
from scratch.

Pablo.

Zeynel

unread,
Nov 20, 2009, 1:08:51 PM11/20/09
to scrapy-users
Thanks! I just started to study those. But what is wrong with this
pipeline?

http://dpaste.org/HibR/

Do I need something more than that? If not what do I need to do make
this one work?



On Nov 20, 12:52 pm, Pablo Hoffman <pablohoff...@gmail.com> wrote:
> On Fri, Nov 20, 2009 at 08:39:04AM -0800, Zeynel wrote:
> > When I run the spider in the command prompt with
>
> > python scrapy-ctl.py crawl whitecase.com
>
> > I see the FirmItem in the return items:
>
> > [whitecase.com] INFO: Passed FirmItem(title=[u'White &amp; Case LLP -
> > Lawyers - Rachel B. Wagner '])
>
> > but where is this item? How do I get it and put it into a file?
>
> By writing an item pipeline to do that:http://doc.scrapy.org/topics/item-pipeline.html
>
> You can use the builtin File Export Pipeline for that:http://doc.scrapy.org/topics/item-pipeline.html#module-scrapy.contrib...

Pablo Hoffman

unread,
Nov 20, 2009, 1:20:40 PM11/20/09
to scrapy...@googlegroups.com
On Fri, Nov 20, 2009 at 10:08:51AM -0800, Zeynel wrote:
> Thanks! I just started to study those. But what is wrong with this
> pipeline?
>
> http://dpaste.org/HibR/
>
> Do I need something more than that? If not what do I need to do make
> this one work?

Add it to your project's Item pipelines, by adding this to your settings.py:

ITEM_PIPELINES [
'myproject.pipelines.CsvWritePipeline'
]

Zeynel

unread,
Nov 20, 2009, 2:01:46 PM11/20/09
to scrapy-users
Yes, thanks. That line is there. I still don't understand how I get
the scraped data into items.csv.

I run the pipelines.py in IDLE but nothing happens. There is a file
items.csv but there is nothing in it :)

Zeynel

unread,
Nov 20, 2009, 6:59:16 PM11/20/09
to scrapy-users
Hi,

Thanks for your help with this. The first sentence here
http://doc.scrapy.org/topics/item-pipeline.html says

"After an item has been scraped by a spider it is sent to the Item
Pipeline which process it through several components that are executed
sequentially."

So, I didn't have to do anything :) The Pipeline opens the items.cvs
and pastes the scraped data in there. Next, I'll use the File Export
Pipeline with csv.

On Nov 20, 12:52 pm, Pablo Hoffman <pablohoff...@gmail.com> wrote:
> On Fri, Nov 20, 2009 at 08:39:04AM -0800, Zeynel wrote:
> > When I run the spider in the command prompt with
>
> > python scrapy-ctl.py crawl whitecase.com
>
> > I see the FirmItem in the return items:
>
> > [whitecase.com] INFO: Passed FirmItem(title=[u'White &amp; Case LLP -
> > Lawyers - Rachel B. Wagner '])
>
> > but where is this item? How do I get it and put it into a file?
>
> By writing an item pipeline to do that:http://doc.scrapy.org/topics/item-pipeline.html
>
> You can use the builtin File Export Pipeline for that:http://doc.scrapy.org/topics/item-pipeline.html#module-scrapy.contrib...
Reply all
Reply to author
Forward
0 new messages