exceptions.IOError: cannot identify image file

Mahmoud Abdel-Fattah

unread,

Aug 20, 2012, 2:38:34 AM8/20/12

to scrapy...@googlegroups.com

Hello,

I'm getting the following error many times without knowing the image file name or the response url to track it:

exceptions.IOError: cannot identify image file

So, how could I solve this issue cause it stops my spider after a specific number of errors that I already defined in settings.py

Thank you,

Mahmoud

Mahmoud Abdel-Fattah

unread,

Oct 10, 2012, 6:00:12 AM10/10/12

to scrapy...@googlegroups.com

Any update about / solution for this problem ?!

Pablo Hoffman

unread,

Oct 11, 2012, 12:54:28 PM10/11/12

to scrapy...@googlegroups.com

Do you have the full traceback?

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/V2sSgubEho8J.

To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Mahmoud Abdel-Fattah

unread,

Dec 17, 2012, 11:35:28 AM12/17/12

to scrapy...@googlegroups.com

Sure, here is it:

2012-12-16 02:09:16+0000 [cairobooks] Unhandled Error
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
	    self._startRunCallbacks(result)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
	    self._runCallbacks()
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	--- <exception caught here> ---
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 199, in media_downloaded
	    checksum = self.image_downloaded(response, request, info)
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
	    for key, image, buf in self.get_images(response, request, info):
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 261, in get_images
	    orig_image = Image.open(StringIO(response.body))
	  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
	    raise IOError("cannot identify image file")
	exceptions.IOError: cannot identify image file

Regards,
Mahmoud

Daniel Graña

unread,

Jan 2, 2013, 9:06:08 AM1/2/13

to scrapy...@googlegroups.com

Hi Mahmoud,

The error says that response body is not a valid image at least for python imaging library.

try to identify the image url and check if you can open it using PIL from outside Scrapy.

the most likely is that image content is invalid, format not recognized by PIL, or server returns HTML instead of image if referer or a cookie is not set.

in any case, to debug the error you need to know the url of the failing image

Mahmoud Abdel-Fattah

unread,

Jan 2, 2013, 11:17:30 AM1/2/13

to scrapy...@googlegroups.com

The problem is that Scrapy doesn't return image URL or item response url in the error stack. So, I don't what images / pages exactly make this problem !

Best,
Mahmoud

Reply all

Reply to author

Forward