how to analyze the gzip html

146 views
Skip to first unread message

曹帅

unread,
Oct 26, 2010, 4:16:45 AM10/26/10
to scrapy...@googlegroups.com
Hello, everyone:
                    I see a questin about gzip html.In the scrapy, there is a downloadmiddleware and it's HttpCompressionMiddleware; if response has Content-Encoding attribute and is gzip, the scrapy will use gzip decompress the html,but error is coming,


2010-10-26 15:50:29+0800 [ScrapyHTTPPageGetter,client] {'X-Powered-By': ['ASP.NET'], 'Transfer-Encoding': ['chunked'], 'Content-Encoding': ['gzip'], 'Vary': ['Accept-Encoding'], 'Server': ['Microsoft-IIS/6.0'], 'Connection': ['close'], 'Date': ['Tue, 26 Oct 2010 07:50:29 GMT'], 'Content-Type': ['text/html']}
2010-10-26 15:50:29+0800 [nf.nfdaily.cn] ERROR: Crawling <http://nf.nfdaily.cn/spqy/default.htm>: [Failure instance: Traceback: <type 'exceptions.IOError'>: Not a gzipped file
    /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:354:_startRunCallbacks
    /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:371:_runCallbacks
    /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:280:callback
    /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:354:_startRunCallbacks
    --- <exception caught here> ---
    /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:371:_runCallbacks
    /media/WORK/ace_news/scrapy/core/downloader/middleware.py:75:process_response
    /media/WORK/ace_news/scrapy/contrib/downloadermiddleware/httpcompression.py:21:process_response
    /media/WORK/ace_news/scrapy/contrib/downloadermiddleware/httpcompression.py:30:_decode
    /usr/lib/python2.6/gzip.py:212:read
    /usr/lib/python2.6/gzip.py:255:_read
    /usr/lib/python2.6/gzip.py:156:_read_gzip_header
    ]

Anyone can me some ideas about the error,thanks!

Pablo Hoffman

unread,
Oct 26, 2010, 11:10:43 AM10/26/10
to scrapy...@googlegroups.com
This works for me:

scrapy shell "http://nf.nfdaily.cn/spqy/default.htm"

What Scrapy version are you using?. Please consider upgrading to 0.10.3.

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>

Reply all
Reply to author
Forward
0 new messages