Using ItemLoader with images (or files)

27 views
Skip to first unread message

BastRoq

unread,
Jan 3, 2017, 9:44:28 AM1/3/17
to scrapy-users
Hi everybody,

I had a spider that used to work nicely but to get a code cleaner I want to use an ItemLoader to populate my item. My item is pretty simple:

class ProductItem(scrapy.Item):
    title = scrapy.Field()
    ean = scrapy.Field() 
    images = scrapy.Field()
    image_urls = scrapy.Field()

Which are populated through the ProductLoader as following:

class ProductLoader(ItemLoader):

default_output_processor = TakeFirst()
title_in = MapCompose()
title_out = StripString()
ean_in = MapCompose()
ean_out = MapCompose(ean_or_none)



My problem come as soon as I try to load images which are returned as following from the xpath resolver:

['/asset/36/88/image1-XL.jpg', '/asset/36/88/image2-XL.jpg']


If I use the Itemloader to populate 'image_urls', it doesn't use the image pipeline, so doesn't download images nor populate the 'images' attribute as it used to be without using the ProductLoader.

Is it possible to invoke the image pipeline through the Item loader?

Bastien

unread,
Jan 4, 2017, 2:52:19 AM1/4/17
to scrapy-users
Ok, I hadn't sufficient grasp on the way it works. First of all, my first mistake was to call the Itemloader with wrong attribute: Item instead of Item().  `l = ItemLoadel(item=Item(), response=response)` I forgot the parenthesis !

In second place, I misunderstood the way it works. The ItemLoader is just here to populate the Item. Once it's done, when the parse method return the item populated with the l.load_item() method, it's time for the pipeline you've registered in the settings to do its trick, and so, to populate the item.images attribute, download files, et cetera.
Reply all
Reply to author
Forward
0 new messages