Error/Bug with tornado.httpclient

151 views
Skip to first unread message

Fabio[mbutubuntu]Buda

unread,
Sep 18, 2011, 3:30:52 PM9/18/11
to Tornado Web Server
I've discovered an error with an URL when requested with httpclient,
almost all URLs work without any problem (also HTTPS that in previous
version caused errors), but this URL "http://forums.mysql.com/read.php?
10,146117,146155" causes an exception.

that page is a standard page and I can't discover why HTTPClient
causes an excpetion; I'm not 100% sure the problem is by Tornado but
my HTTPClient code works correctly for many URLs and Traceback make me
think Tornado is the problem.

This is the traceback:

[W 110918 19:27:09 simple_httpclient:261] uncaught exception
Traceback (most recent call last):
File "/usr/local/src/tornado-2.0/tornado2/lib/python2.7/site-
packages/tornado-2.0-py2.7.egg/tornado/simple_httpclient.py", line
259, in cleanup
yield
File "/usr/local/src/tornado-2.0/tornado2/lib/python2.7/site-
packages/tornado-2.0-py2.7.egg/tornado/stack_context.py", line 183, in
wrapped
callback(*args, **kwargs)
File "/usr/local/src/tornado-2.0/tornado2/lib/python2.7/site-
packages/tornado-2.0-py2.7.egg/tornado/simple_httpclient.py", line
344, in _on_chunk_length
self._on_chunk_data)
File "/usr/local/src/tornado-2.0/tornado2/lib/python2.7/site-
packages/tornado-2.0-py2.7.egg/tornado/iostream.py", line 153, in
read_bytes
self._check_closed()
File "/usr/local/src/tornado-2.0/tornado2/lib/python2.7/site-
packages/tornado-2.0-py2.7.egg/tornado/iostream.py", line 403, in
_check_closed
raise IOError("Stream is closed")
IOError: Stream is closed

Alek Storm

unread,
Sep 18, 2011, 4:14:24 PM9/18/11
to python-...@googlegroups.com
With the following code, I'm not getting an error:

from tornado.httpclient import AsyncHTTPClient, HTTPClient
AsyncHTTPClient.configure('tornado.simple_httpclient.SimpleAsyncHTTPClient')
client = HTTPClient()

Could you elaborate on exactly how you're fetching the URL?

Ben Darnell

unread,
Sep 18, 2011, 5:33:57 PM9/18/11
to python-...@googlegroups.com
I can't reproduce the problem myself, but this is most likely the chunked-close bug, which is dependent on network timing.  It's fixed in the soon-to-be-released version 2.1.  

-Ben

Phil Whelan

unread,
Sep 18, 2011, 5:59:21 PM9/18/11
to python-...@googlegroups.com
Sounds like the same problem I was encountering, which was solved by grabbing the latest from Github.

Phil
--
Cell : +1 (778) 233-4935
#219-1628 W 1st Ave, Vancouver, BC V6J 1G1
Twitter : http://www.twitter.com/philwhln
LinkedIn : http://ca.linkedin.com/in/philwhln
Blog : http://www.philwhln.com
Skype : philwhelan76

Fabio[mbutubuntu]Buda

unread,
Sep 19, 2011, 6:33:55 AM9/19/11
to Tornado Web Server
I've sometimes problems with encoding and sometimes problems with
IOError: Stream is closed, problems aren't really reproducible, but
this link "http://forums.mysql.com/read.php?10,146117,146155" gives
again errors, where are problems in my code?

This is my code:

import lxml.html as jq

[...]

class someClass(BaseHandler):
@tornado.web.asynchronous
def post(self):
urlink=self.get_argument("url")
self.set_header("Content-Type", "application/javascript")

http = tornado.httpclient.AsyncHTTPClient()
self.reqUrl = urlink
http.fetch(urlink, callback=self.on_response, headers={"Accept-
Encoding":"gzip,deflate"})

def on_response(self, response):
if response.error: raise tornado.web.HTTPError(500)
html = jq.document_fromstring(response.body)
html.make_links_absolute(self.reqUrl, resolve_base_href=True)
uid = str(uuid.uuid1())
sha = hashlib.sha1()
sha.update(uid)
myUid = sha.hexdigest()
counter = 0
theImg = "none"
theTitle = "none"

try:
for img in html.xpath("//img"):
if (counter < 1):
theImg = img.attrib['src']
counter += 1
else:
pass
except:
theImg = "none"

print title.text_content().replace("\n", "")
theTitle = title.text_content().replace("\n", "")

self.write("[{\"tle\":\""+theTitle+"\",\"img\":\""+theImg+"\",\"uid\":
\""+myUid+"\"}]")
self.finish()



On Sep 18, 11:59 pm, Phil Whelan <phil...@gmail.com> wrote:
> Sounds like the same problem I was encountering, which was solved by
> grabbing the latest from Github.https://groups.google.com/forum/#!topic/python-tornado/xx9-47Waha4/di...
>
> Phil
>
>
>
>
>
>
>
>
>
> On Sun, Sep 18, 2011 at 2:33 PM, Ben Darnell <b...@bendarnell.com> wrote:
> > I can't reproduce the problem myself, but this is most likely the
> > chunked-close bug, which is dependent on network timing.  It's fixed in the
> > soon-to-be-released version 2.1.
>
> > -Ben
>
> > On Sun, Sep 18, 2011 at 12:30 PM, Fabio[mbutubuntu]Buda <
> > mbutubu...@yahoo.it> wrote:
>
> >> I've discovered an error with an URL when requested with httpclient,
> >> almost all URLs work without any problem (also HTTPS that in previous
> >> version caused errors), but this URL "http://forums.mysql.com/read.php?
> >> 10,146117,146155 <http://forums.mysql.com/read.php?10,146117,146155>"

Fabio[mbutubuntu]Buda

unread,
Sep 19, 2011, 8:28:33 AM9/19/11
to Tornado Web Server
[you can find the code, Traceback and the URLs response headers on
github, click here https://github.com/netdesign/python_tornado_test]

I've also tried a simple program to be sure tornado is the problem;
this is the code:

import tornado.httpclient
http = tornado.httpclient.HTTPClient()
res = http.fetch("http://forums.mysql.com/read.php?10,146117,146155")

Traceback:

WARNING:root:uncaught exception
Traceback (most recent call last):
File "tornado/simple_httpclient.py", line 289, in cleanup
yield
File "tornado/stack_context.py", line 183, in wrapped
callback(*args, **kwargs)
File "tornado/simple_httpclient.py", line 384, in _on_chunk_length
self._on_chunk_data)
File "tornado/iostream.py", line 171, in read_bytes
self._check_closed()
File "tornado/iostream.py", line 493, in _check_closed
raise IOError("Stream is closed")
IOError: Stream is closed
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tornado/httpclient.py", line 85, in fetch
response.rethrow()
File "tornado/httpclient.py", line 357, in rethrow
raise self.error
IOError: Stream is closed


On Sep 19, 12:33 pm, "Fabio[mbutubuntu]Buda" <mbutubu...@yahoo.it>

Ben Darnell

unread,
Sep 21, 2011, 10:58:27 PM9/21/11
to python-...@googlegroups.com
OK, I can reproduce this now.  I'm working on a proper fix, but if you need a quick workaround increasing read_chunk_size in the IOStream constructor seems to do the trick.

-Ben

Ben Darnell

unread,
Sep 22, 2011, 3:35:09 AM9/22/11
to python-...@googlegroups.com
I've just checked in a change that fixes the problem, at least for me.  Can you give it a try and see if it works for you?


-Ben

Fabio[mbutubuntu]Buda

unread,
Sep 22, 2011, 12:59:06 PM9/22/11
to Tornado Web Server
Wonderful! I've tested last GIT version and it works perfectly; now
HTTPClient doesn't rise any exception (I made some tens of requests,
any error). Only a question Ben, was the IOError caused by absence of
the Content-Length header?
anyway thank you for your work.

On 22 Set, 09:35, Ben Darnell <b...@bendarnell.com> wrote:
> I've just checked in a change that fixes the problem, at least for me.  Can
> you give it a try and see if it works for you?
>
> https://github.com/facebook/tornado/commit/8572cc40a1c514ac828f04ce67...
>
> -Ben
>
>
>
>
>
>
>
> On Wed, Sep 21, 2011 at 7:58 PM, Ben Darnell <b...@bendarnell.com> wrote:
> > OK, I can reproduce this now.  I'm working on a proper fix, but if you need
> > a quick workaround increasing read_chunk_size in the IOStream constructor
> > seems to do the trick.
>
> > -Ben
>
> > On Mon, Sep 19, 2011 at 5:28 AM, Fabio[mbutubuntu]Buda <
> > mbutubu...@yahoo.it> wrote:
>
> >> [you can find the code, Traceback and the URLs response headers on
> >> github, click herehttps://github.com/netdesign/python_tornado_test]

Ben Darnell

unread,
Sep 22, 2011, 1:20:48 PM9/22/11
to python-...@googlegroups.com
No, it's normal to not have a Content-Length header if you're using Transfer-Encoding: chunked.  The problem here is that the last several chunks of the response arrived together along with the notification that the server had closed the connection, and the close event was handled before all the chunks had been read.

-Ben

Fabio[mbutubuntu]Buda

unread,
Sep 22, 2011, 1:39:11 PM9/22/11
to python-...@googlegroups.com
Ok, in my opinion saying that an HTTP connection hasn't a Content-Length is the same as saying that connection uses Chunked Transfer Encoding... by the way the problem is what I thought, in fact a little number of pages suffered that problem because Transfer Encoding Chuncked is not broadly diffused yet.

Ben, have you ever done any Memory Leak test against Tornado? I'm discovering that Memory usage grows in a logarithmic way when number of AsyncHTTPClient Requests grows... but this is a brand new thread :-)

Ben Darnell

unread,
Sep 22, 2011, 1:56:14 PM9/22/11
to python-...@googlegroups.com
Chunked encoding has been quite widely supported for years, although the two months-old chunk-related bugs I've just fixed show that it can be kind of tricky.  

It's been a while since if done a thorough memory leak investigation; if you've found anything suspicious please post more details to the list.

-Ben
Reply all
Reply to author
Forward
0 new messages