exceptions.IOError: cannot identify image file

1,001 views
Skip to first unread message

Mahmoud Abdel-Fattah

unread,
Aug 20, 2012, 2:38:34 AM8/20/12
to scrapy...@googlegroups.com
Hello,

I'm getting the following error many times without knowing the image file name or the response url to track it:
exceptions.IOError: cannot identify image file

So, how could I solve this issue cause it stops my spider after a specific number of errors that I already defined in settings.py

Thank you,
Mahmoud

Mahmoud Abdel-Fattah

unread,
Oct 10, 2012, 6:00:12 AM10/10/12
to scrapy...@googlegroups.com
Any update about / solution for this problem ?!

Pablo Hoffman

unread,
Oct 11, 2012, 12:54:28 PM10/11/12
to scrapy...@googlegroups.com
Do you have the full traceback?

--
You received this message because you are subscribed to the Google Groups "scrapy-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/scrapy-users/-/V2sSgubEho8J.

To post to this group, send email to scrapy...@googlegroups.com.
To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Mahmoud Abdel-Fattah

unread,
Dec 17, 2012, 11:35:28 AM12/17/12
to scrapy...@googlegroups.com
Sure, here is it:
2012-12-16 02:09:16+0000 [cairobooks] Unhandled Error
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 362, in callback
	    self._startRunCallbacks(result)
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 458, in _startRunCallbacks
	    self._runCallbacks()
	  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 545, in _runCallbacks
	    current.result = callback(current.result, *args, **kw)
	--- <exception caught here> ---
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 199, in media_downloaded
	    checksum = self.image_downloaded(response, request, info)
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 252, in image_downloaded
	    for key, image, buf in self.get_images(response, request, info):
	  File "/usr/lib/pymodules/python2.7/scrapy/contrib/pipeline/images.py", line 261, in get_images
	    orig_image = Image.open(StringIO(response.body))
	  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
	    raise IOError("cannot identify image file")
	exceptions.IOError: cannot identify image file


Regards,
Mahmoud

Daniel Graña

unread,
Jan 2, 2013, 9:06:08 AM1/2/13
to scrapy...@googlegroups.com
Hi Mahmoud, 

The error says that response body is not a valid image at least for python imaging library.
try to identify the image url and check if you can open it using PIL from outside Scrapy.

the most likely is that image content is invalid, format not recognized by PIL, or server returns HTML instead of image if referer or a cookie is not set. 

in any case, to debug the error you need to know the url of the failing image

Mahmoud Abdel-Fattah

unread,
Jan 2, 2013, 11:17:30 AM1/2/13
to scrapy...@googlegroups.com
The problem is that Scrapy doesn't return image URL or item response url in the error stack. So, I don't what images / pages exactly make this problem !

Best,
Mahmoud
Reply all
Reply to author
Forward
0 new messages