Downloading images problem

80 views
Skip to first unread message

dmtrs

unread,
Dec 28, 2016, 7:08:30 AM12/28/16
to scrapy-users
Hello people,

as title says, problem with images....here is my code

pipelines.py

class MyImagePipeline(ImagesPipeline):


    headers
= {
       
'Host': 'cdn.autodoc.de',
       
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0',
       
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       
'Accept-Language': 'en-US,en;q=0.5',
       
'Connection': 'keep-alive',
       
'Upgrade-Insecure-Requests': '1',
       
'Pragma': 'no-cache',
       
'Cache-Control': 'no-cache',
   
}

   
def get_media_requests(self, item, info):

       
for image_url in item['image_urls']:
           
# r = requests.get(image_url, stream=True)
           
#
           
# if r.ok:
           
#     with open('/home/dimitris/stock/Dropbox/cargr/autoparts/images/%s.png' % str(uuid.uuid4()),
           
#               'wb') as pic:
           
#         for chunk in r:
           
#             pic.write(chunk)

           
yield scrapy.Request(image_url, headers=self.headers)

i intentionally left the requests code in there...i have tried with the requests library in a terminal and the pics download properly without even changing the user-agent 


somewhere in my crawler class i have

pic
= response.xpath('//div[@class="image"]/span/img/@src').extract()
item
['image_urls'] = pic



which returns

 'image_urls': [u'http://cdn.autodoc.de/thumb?id=7079085&lng=en'],

in my items.py i have

    image_urls
= scrapy.Field()
    images
= scrapy.Field()

settings.py

ITEM_PIPELINES = { 'autoparts.pipelines.AutopartsPipeline': 700,
                 
'autoparts.pipelines.MyImagePipeline': 600
                 
}


in the terminal i just see this error

2016-12-28 13:03:12 [scrapy.core.engine] DEBUG: Crawled (301) <GET https://cdn.autodoc.de/thumb?id=7079085&lng=en> (referer: None)
2016-12-28 13:03:12 [scrapy.pipelines.files] WARNING: File (code: 301): Error downloading file from <GET https://cdn.autodoc.de/thumb?id=7079085&lng=en> referred in <None




i have also tried replacing https with http, in the browser returns the same pic

any suggestion would be appreciated :)

thanks
Reply all
Reply to author
Forward
0 new messages