Hello people,
as title says, problem with images....here is my code
pipelines.py
class MyImagePipeline(ImagesPipeline):
headers = {
'Host': 'cdn.autodoc.de',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
# r = requests.get(image_url, stream=True)
#
# if r.ok:
# with open('/home/dimitris/stock/Dropbox/cargr/autoparts/images/%s.png' % str(uuid.uuid4()),
# 'wb') as pic:
# for chunk in r:
# pic.write(chunk)
yield scrapy.Request(image_url, headers=self.headers)
i intentionally left the requests code in there...i have tried with the requests library in a terminal and the pics download properly without even changing the user-agent
somewhere in my crawler class i have
pic = response.xpath('//div[@class="image"]/span/img/@src').extract()
item['image_urls'] = pic
which returns
'image_urls': [u'
http://cdn.autodoc.de/thumb?id=7079085&lng=en'],
in my items.py i have
image_urls = scrapy.Field()
images = scrapy.Field()
settings.py
ITEM_PIPELINES = { 'autoparts.pipelines.AutopartsPipeline': 700,
'autoparts.pipelines.MyImagePipeline': 600
}
in the terminal i just see this error
i have also tried replacing https with http, in the browser returns the same pic
any suggestion would be appreciated :)
thanks