GAE Python Chinese feed url fetch problem

32 views
Skip to first unread message

Alan Xing

unread,
Nov 19, 2009, 2:13:46 AM11/19/09
to google-a...@googlegroups.com
Hi there,

We have run into this feed fetch problem, with GAE Python code throwing DownloadError. This problem seems to happen to a lot of feeds with Chinese characters inside. Almost any feed from feedsky.com causes such problem.


Is this a known problem? Any suggestion to fix or work around is appreciated!

Thanks,
Alan

 File "D:\Python25\lib\urllib2.py", line 381, in open

    response = self._open(req, data)

  File "D:\Python25\lib\urllib2.py", line 399, in _open

    '_open', req)

  File "D:\Python25\lib\urllib2.py", line 360, in _call_chain

    result = func(*args)

  File "D:\Python25\lib\urllib2.py", line 1107, in http_open

    return self.do_open(httplib.HTTPConnection, req)

  File "D:\Python25\lib\urllib2.py", line 1080, in do_open

    r = h.getresponse()

  File "D:\Program Files\Google\google_appengine\google\appengine\dist\httplib.py", line

203, in getresponse

    self._allow_truncated, self._follow_redirects)

  File "D:\Program Files\Google\google_appengine\google\appengine\api\urlfetch.py", line

241, in fetch

    return rpc.get_result()

  File "D:\Program

Files\Google\google_appengine\google\appengine\api\apiproxy_stub_map.py", line 478, in

get_result

    return self.__get_result_hook(self)

  File "D:\Program Files\Google\google_appengine\google\appengine\api\urlfetch.py", line

325, in _get_fetch_result

    raise DownloadError(str(err))

DownloadError: ApplicationError: 2 ['\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec}

\xe9S\x1bY\xb2\xef\xe7\xf6_Q\xd7\x13w\xdc\x13m\xd0\x02bk\xa0oO\x8f\xbb_\xbf\xe9\xc5\xd1

\xf6\x8c\xa3\xe3\xc6\r\x87\x90JBc!i\xb4\x18s\xe3\xc5\r\t\x10\x88U\xc2fG\x98

Ikai L (Google)

unread,
Nov 20, 2009, 2:40:37 PM11/20/09
to google-a...@googlegroups.com
It looks like there may be an issue with feedsky and responding to "Accepts-Encoding: gzip" header. I changed urlfetch to work like this:

content = urlfetch.fetch(url, headers={ "Accept-Encoding" : "identity" })

It has not failed on me once. I also tried the following experiments. I'm just using straight curl here, no fancy programs. Note how I am cut off sometimes:

computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   100k      0  0:00:01  0:00:01 --:--:--  127k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  133k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  133k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  132k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  132k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  133k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 21  180k   21 39886    0     0   6885      0  0:00:26  0:00:05  0:00:21     0
curl: (18) transfer closed with 145287 bytes remaining to read
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  132k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: gzip" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 21  180k   21 39886    0     0   6879      0  0:00:26  0:00:05  0:00:21     0
curl: (18) transfer closed with 145287 bytes remaining to read

Here's the same test passing a different "Accept-Encoding" header:

computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   115k      0  0:00:01  0:00:01 --:--:--  154k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   103k      0  0:00:01  0:00:01 --:--:--  132k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   115k      0  0:00:01  0:00:01 --:--:--  155k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   104k      0  0:00:01  0:00:01 --:--:--  133k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   116k      0  0:00:01  0:00:01 --:--:--  155k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   116k      0  0:00:01  0:00:01 --:--:--  153k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   115k      0  0:00:01  0:00:01 --:--:--  154k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0  99141      0  0:00:01  0:00:01 --:--:--  121k
computer@computer:/tmp$ curl -i -H "Accept-Encoding: identity" http://feed.feedsky.com/qiushi > blah
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  180k  100  180k    0     0   104k      0  0:00:01  0:00:01 --:--:--  133k

Granted, this is a pretty small sample size and not an exhaustive test, but I'd bet money right now that something is up with feedsky.com's GZIP response.


--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=.



--
Ikai Lan
Developer Programs Engineer, Google App Engine

Alan Xing

unread,
Dec 20, 2009, 11:27:28 AM12/20/09
to google-a...@googlegroups.com
ikai, you bet your money correctly. we confirmed your diagnose. The pain point, though, is that there are quite a few popular feed service providers having these kind of problems. sorry for replying  back to you late.

Ikai L (Google)

unread,
Dec 21, 2009, 12:52:20 PM12/21/09
to google-a...@googlegroups.com
It shouldn't be that bad. Just pass "identity" as the accept-encoding header. The only tradeoff is that you are going to pay a higher bandwidth cost. There's nothing else you can do in your application - if you get bad data in one format consistently but not in a more expensive format, go for the more expensive format unless you can throw away bad responses.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Reply all
Reply to author
Forward
0 new messages