How to make multiple items from a single one coming to a pipeline?

45 views
Skip to first unread message

jan.j...@gmail.com

unread,
Mar 30, 2014, 3:58:12 PM3/30/14
to scrapy...@googlegroups.com
Hi,

in my data cleansing and post-processing (which takes place in pipelines) I sometimes meet a situation where it's suitable to split an item into several separate ones. Doing this in spider would be very difficult, thus I really need to do it in pipelines. However, process_item returns only one value, it's not a generator. What would be the best way to achieve splitting an item in a pipeline?

Thanks,
Honza

Pablo Hoffman

unread,
Apr 17, 2014, 4:49:04 PM4/17/14
to scrapy-users
You can't do that in a pipeline, you'd have to write a spider middleware instead.


--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users...@googlegroups.com.
To post to this group, send email to scrapy...@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Bill Ebeling

unread,
Apr 18, 2014, 4:13:16 PM4/18/14
to scrapy...@googlegroups.com
I've got at spider that creates several items once the main item is created, I simply create a new Item(), copy the original item into it, then reset the values that are different.  Add each item to a list, then yield them in a loop.

    #...item definitions above
    prod2
= ScrapeItem()
   
for elem in product:
      prod2
[elem] = product[elem]

    prod2
['model'] = prod2['model'] + '1'
    prod2
['price'] = float(prod2['price']) * .75
    prod2
['type'] = 2

   
for p in [product, prod2]:
     
yield p


Hope that helps a little.
Reply all
Reply to author
Forward
0 new messages