Hey Ivanov,
now I'm unsure whether you received my private mail from the 11th, so
here it is again:
Hey Ivanov,
I can point you in the right direction, but really, it's all there in
the docs
Pipelines are a really easy concept: Every Item that is scraped (i.e.
yielded or returned) by the Spider is given to the process_item() method
of all pipelines. This method can then inspect and modify the item and
must do one of two things:
- if it returns the Item, it will be processed by the next pipeline, or
if there is no further pipeline, go to the feed exports (see
http://doc.scrapy.org/en/latest/intro/tutorial.html#storing-the-scraped-data)
- if it raises scrapy.exceptions.DropItem, this particular item will
stop being processed, end of story. You can use this if you want to
filter your items for certain characteristics.
There are a couple of extra methods you *can* implement if you want,
e.g. to open/close files or database connections, but literally all that
a pipeline *must* do is have a process_item() method. All methods, their
signatures, and their use cases are explained here:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline
The most common use case for pipelines is to write scraped data to a
database. The docs have an example for MongoDB:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-mongodb
You can have multiple pipelines, and the items will be processed in the
order you set in your ITEM_PIPELINES setting (which you set in your
settings.py file), as explained here:
http://doc.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component
Whether you need item pipelines at all really depends on what you want
to do.
Cheers,
-Jakob
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
>
https://groups.google.com/d/topic/scrapy-users/ttaAatl0LCg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
>
scrapy-users...@googlegroups.com
> <mailto:
scrapy-users...@googlegroups.com>.
> <mailto:
scrapy...@googlegroups.com>.