httpclient trouble with epoll

818 views
Skip to first unread message

Jacob Kristhammar

unread,
Nov 6, 2009, 4:02:43 PM11/6/09
to Tornado Web Server
Hi,

The last couple of weeks we've been using Tornado and the async.
httpclient to develop a server that juggles long-poll requests on one
side and a lot of async requests on the other. This server was
initially developed on a Mac and was hence using select and not epoll.
When we moved the server to a Linux (Ubuntu) machine weird things
started to happen in the httpclient.

Sometimes the httpclient looses track of some of the fd's used by
libcurl. This causes the calls to ioloop remove/update/unregister in
httpclient to raise IOError ENOENT (no such file). If there are still
active curls in the multi set (as indicated by num_handles) the
httpclient will spin out of control and raise the same exception every
200 ms (the timout intervall to call perform). This happens since the
exceptions are unhandled and httpclient fails to remove the fd in
question from the set of old fds (_fds).

This situation is easy to reproduce with the following code (NB. only
if epoll is used) http://gist.github.com/228264 (paste coded in the
end of the post too).

In on terminal
python server.py --logging=debug

In another
python client.py --logging=debug


My guess is that this is caused by epoll removing closed fd's from its
fdset when a socket is closed. Somehow the httpclient/libcurl still
has a handle to this fd causing the situation above. It also seems to
be related to how libcurl reuses and caches connections.


This however is not the exact problem as we experience it. I haven't
managed to fully reproduce the problem in a controlled way, but it
always happens after a while (i guess we are using the async
httpclient quiet excessively) . The first time the error occurs is
always when libcurl/httpclient tries to reuse a socket and issues an
ioloop.update_handler call changing the fd from read to write. This
always seem to cause an exception being raised when calling
self.io_loop.update_handler. That is, if in one iteration of _perfom
in httpclient the fd is in the set of readable fd as reported by fdset
() and the next iteration in the set of writable.

I'm starting to run out of ideas, but it seems to be correlated to how
epoll handles closed connections and libcurls strategies to reuse
connections.

Before the problem triggered by update_handler being called with an fd
that doesn't exist anymore according to epoll, there's always some
logging from curl saying that a connection died, trying to reuse, etc.

More info and trace available below. Any kind of ideas and feedback
are welcome.


BR,
Jacob


This is what it looks like. This output contains some additional
logging showing the return values of fdset in httpclient and whenever
fds are changed by ioloop. The exception from the log below is
currently caught and logged manually hence the "bug place 3" message.
It's also a special build of libcurl that prints the fd of a
connection when it's closed. (The same problem occured with the
standard libcurl too)

...
DEBUG Connection died, retrying a fresh connect
DEBUG Expire cleared
DEBUG Closing connection #3 (27)
DEBUG Issue another request to this URL: 'http://xxx:3080/xxx

DEBUG > GET /xxx HTTP/1.1
DEBUG > User-Agent: Mozilla/5.0 (compatible; pycurl)
DEBUG > Accept: */*
DEBUG > Accept-Encoding: gzip,deflate
DEBUG > Host: 30930ce6fd01b8e9
DEBUG >
DEBUG > GET /xxx HTTP/1.1
DEBUG > User-Agent: Mozilla/5.0 (compatible; pycurl)
DEBUG > Accept: */*
DEBUG > Accept-Encoding: gzip,deflate
DEBUG > Host: 30930ce6fd01b8e9
DEBUG >
DEBUG > GET /xxx HTTP/1.1
DEBUG > User-Agent: Mozilla/5.0 (compatible; pycurl)
DEBUG > Accept: */*
DEBUG > Accept-Encoding: gzip,deflate
DEBUG > Host: 30930ce6fd01b8e9
DEBUG >
DEBUG About to connect() to xxx port 3080 (#3)
DEBUG Trying 85.235.1.137...
INFO READABLE: [9, 26, 28, 29, 30, 31, 42, 43, 44]
INFO WRITABLE: [27]
INFO EXCPTBLE: []
INFO UPDATING HANDLER FOR FD: 42 EVENTS: 0x3
INFO UPDATING HANDLER FOR FD: 43 EVENTS: 0x3
INFO UPDATING HANDLER FOR FD: 44 EVENTS: 0x3
INFO UPDATING HANDLER FOR FD: 27 EVENTS: 0x4
DEBUG bug place 3
Traceback (most recent call last):
File "/xxx/tornado/httpclient.py", line 237, in _perform
self.io_loop.update_handler(fd, events)
File "/xxx/tornado/ioloop.py", line 128, in update_handler
self._impl.modify(fd, events | self.ERROR)
IOError: [Errno 2] No such file or directory
...


Example files.

server.py
---------

import logging
import time

import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web

from tornado.options import define, options

define("port", default=8888, help="run on the given port", type=int)

class MainHandler(tornado.web.RequestHandler):
@tornado.web.asynchronous
def get(self):
delay = int(self.get_argument("delay", 0))
logging.info("delaying %d sec", delay)
tornado.ioloop.IOLoop.instance().add_timeout(time.time() +
delay,
self.async_callback(self.send_response))

def send_response(self):
self.write("This mission is too important for me to "
"allow you to jeopardize it.\n")
self.finish()


def main():
tornado.options.parse_command_line()
logging.info("listening on port %d", options.port)
application = tornado.web.Application([
(r"/", MainHandler),
])
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()


if __name__ == "__main__":
main()


client.py
---------

import logging

import tornado.ioloop
import tornado.httpclient
import tornado.options

url = "127.0.0.1:8888"

def first():
def handle_response(response):
logging.info("Response: %s", response.body)
second()
http = tornado.httpclient.AsyncHTTPClient()
logging.info("sending failing request")
http.fetch(url + "/?delay=2",
request_timeout=1,
callback=handle_response)
logging.info("sending slow request")
http.fetch(url + "/?delay=1000",
request_timeout=1000,
callback=logging.info)


def second():
def handle_response(response):
logging.info("Response: %s", response.body)

http = tornado.httpclient.AsyncHTTPClient()
logging.info("sending second request")
http.fetch(url, callback=handle_response)


def main():
tornado.options.parse_command_line()
tornado.ioloop.IOLoop.instance().add_callback(first)
tornado.ioloop.IOLoop.instance().start()


if __name__ == "__main__":
main()

Elias Torres

unread,
Nov 6, 2009, 4:27:39 PM11/6/09
to python-...@googlegroups.com
I'm getting a similar error. I have a basic Twitter client for posting
updates. It works perfect in the Mac, but I'm getting this error on
Linux/Debian. Although, the request is working just fine.

[I 091106 21:24:14 web:725] 302 POST /foo/ (127.0.0.1) 895.19ms
[D 091106 21:24:14 ioloop:133] Error deleting fd from IOLoop


Traceback (most recent call last):

File "/home/x/tornado/tornado/ioloop.py", line 131, in
remove_handler
self._impl.unregister(fd)
File "/home/x/tornado/tornado/ioloop.py", line 308, in unregister
epoll.epoll_ctl(self._epoll_fd, self._EPOLL_CTL_DEL, fd, 0)
OSError: [Errno 9] Bad file descriptor

-Elias

Jacob Kristhammar

unread,
Nov 11, 2009, 6:37:41 AM11/11/09
to python-...@googlegroups.com
Hi again,

According to one of the libcurl devs epoll and curl_multi_perform is
not a good combo. Instead the newer curl_multi_socket_action.
I've created an initial version of the AsyncHTTPClient that uses
socket_action instead of perform.
So far I've had very good results and all of my problems described
before have disappeared.

The current version of this httpclient is available as httpclient2 on
my tornado fork http://github.com/sris/tornado/blob/04e67fa358b0ac6da04a14d4d64f406d0c3131df/tornado/httpclient2.py

It's still very immature and there's still work to be done so any
feedback and comments are very much appreciated.

Bret et al, any specific reason to why you guys chose to use
multi_perform and not socket_action. Maybe something i've overseen?

I ran into some issues with timeouts because of libcurl known bug #62:
CURLOPT_TIMEOUT does not work properly with the regular multi and
multi_socket interfaces. The work-around for apps is to simply
remove the
easy handle once the time is up. See also:
http://curl.haxx.se/bug/view.cgi?id=2501457
Personally I don't really need the request timeouts anyway so I've
currently replaced it with a parameter called idle_timeout that
instead uses the low_speed_limit/time and timeouts a connection if
there's no activity for the specified number of seconds. (the non-
async client still needs and works with request timeout hence i
removed it from httpclient2 and the standard httpclient version can be
used when needed.)


I will continue to develop this version of the httpclient and test it
is as much as possible.


-- Jacob

stevvooe

unread,
Nov 12, 2009, 3:53:33 PM11/12/09
to Tornado Web Server
This error has nothing to do with epoll vs select. The epoll.c module
throws an OSError when an fd is not found and the select.epoll module
throws an IOError.

I filed a bug on this issue here:

http://github.com/facebook/tornado/issues#issue/32

Also, note that Elias and Jacobs errors are different.

Looking at the code now, Elias' bug should be fixed, because the
OSError is caught. Jacobs bug is still in the main repo, but I made a
small change in a fork that fixes this:

http://github.com/stevvooe/tornado/commit/d85baebc02791ac4ddc3ce4a3d70ac5361fae3ba

I haven't heard anything from the maintainers though and there has
been no commits for about three weeks, so I don't know what the status
is.

_steve

On Nov 11, 3:37 am, Jacob Kristhammar <kristham...@gmail.com> wrote:
> Hi again,
>
> According to one of the libcurl devs epoll and curl_multi_perform is  
> not a good combo. Instead the newer curl_multi_socket_action.
> I've created an initial version of the AsyncHTTPClient that uses  
> socket_action instead of perform.
> So far I've had very good results and all of my problems described  
> before have disappeared.
>
> The current version of this httpclient is available as httpclient2 on  
> my tornado forkhttp://github.com/sris/tornado/blob/04e67fa358b0ac6da04a14d4d64f406d0...
> >> if epoll is used)http://gist.github.com/228264(paste coded in the

Jacob Kristhammar

unread,
Nov 12, 2009, 5:35:50 PM11/12/09
to python-...@googlegroups.com
I've tried your patch and it doesn't solve my issues. e.g. The code I
provided in my initial post still ends up spinning out of control even
if IOErrors are caught when modify fails. (This example code raises an
error when a fd is removed. only when epoll is used).

You're right that it will catch some errors but i still suspect that
the problem is rooted in epoll removing closed connections under
libcurls feet. I think it depends on a number of things such a
reference counts to the sockets in question etc. I'm not sure but
since I was unable to reproduce it I won't say too much.

What I do know is that it's not a recommended mix, epoll and
multi_perform (according to Daniel Stenberg of libcurl).

Before digging deeper into these issues we also patched the weak-spots
in the httpclient catching the thrown erros, what makes me nervous
though is that we at later state ran into an issue where libcurl was
getting stuck in an infinite loop caused be a bad fd. Still, I haven't
managed to fully reproduce this, but it was always happening after a
while of heavy httpclient usage.

I'd love to be proven wrong because it would make life much easier for
me, but simply catching these error doesn't seem to be the fix.


-- Jacob

Stephen Day

unread,
Nov 12, 2009, 9:39:00 PM11/12/09
to python-...@googlegroups.com
Ok, I ran your code and figured it out. You need to add a stop in your client:

*** client-jacob.py    2009-11-12 18:33:26.000000000 -0800
--- client.py    2009-11-12 18:30:18.000000000 -0800
***************
*** 23,28 ****
--- 23,29 ----

  def second():
      def handle_response(response):
          logging.info("Response: %s", response.body)
+         tornado.ioloop.IOLoop.instance().stop()

 
      http = tornado.httpclient.AsyncHTTPClient()
      logging.info("sending second request")

From there, I also patched the perform loop to catch the looping error:

http://github.com/stevvooe/tornado/commit/b01cf27a3b6a956fca65e184f1ded63e5c875b97

The problem arises from the fact that it loops around and continues to check the dead file descriptors, raising an exception each time. The IOError signals that it is no longer valid, so we ignore it and move on. The code below that does the accounting on this. Problem solved.

_steve

Jacob Kristhammar

unread,
Nov 13, 2009, 5:04:32 AM11/13/09
to python-...@googlegroups.com
This doesn't solve my problems. As i mentioned in my last post i've  
already tried to patch those weak spots and ran into weird libcurl  
errors (infinite loop in Curl_socket_ready in select.c, caused by a poll to a broken fd with a 0 timeout). 

But since I can't reproduce this in a controlled way i'll have to get back when I've managed to pin that one down.

Meanwhile I will continue to experiment with the multi_socket_action-driven httpclient since I've got good results with it and it's the recommended way to use libcurl with epoll.

Also, my example is just a contrieved example and the same error  
arise in real applications where stop is not possible and the stop() invocation will cut of the second long-running request in my example.

-- Jacob

Sergey Konozenko

unread,
Dec 10, 2009, 6:51:38 PM12/10/09
to Tornado Web Server
I'm having the same issue. The setup is a bit simpler as we only use
async http client to talk to back-end services. The number of
simultaneous requests getting executed via the client can be anywhere
from 2 to 15.

I'm running:
-- Linux version 2.6.28-11-server (buildd@crested) (gcc version 4.3.3
(Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17
-- $ curl --version
curl 7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/
1.2.3.3 libidn/1.10
-- python 2.6.2:
file /usr/bin/python2.6
/usr/bin/python2.6: ELF 64-bit LSB executable, x86-64, version 1
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15,
stripped

I did the trick with catching both IOError and OSError in ioloop.py
and httpclient.py that makes exception messages go away. I suspect
that Jacob is right as this may just hide the issue. I haven't
experienced the infinite loop Jacob is talking about yet. I hope Jacob
will be proved wrong as the code is going into production in a week :-
D.

Is there anybody who can look into this issue? Basically, I'm worried
why the authors didn't run into such issue themselves as they seeming
use the client in the same way.

Could it be an issue of curllib version? I'm using a bit later version
that the tornado web site's instructions mention.

Thanks,
Sergey

Bret Taylor

unread,
Dec 11, 2009, 6:02:06 AM12/11/09
to python-...@googlegroups.com
If it is possible to make a small program that reproduces this on you OS, that would be really helpful - I would love to debug this issue.

Sergey Konozenko

unread,
Dec 11, 2009, 9:44:35 AM12/11/09
to Tornado Web Server
Brent,

Good to hear from you.

Jacob, who started this thread, had created an example server and
client and included them in the original message. I'm just doing a
copy&paste below. Meanwhile, I'll try to see if I can come up with
anything else.
On Dec 11, 6:02 am, Bret Taylor <btay...@gmail.com> wrote:
> If it is possible to make a small program that reproduces this on you OS,
> that would be really helpful - I would love to debug this issue.
>

David Novakovic

unread,
Dec 19, 2009, 6:13:34 AM12/19/09
to Tornado Web Server
Hey we are looking to put some production code live very soon
(tonight) and have come across this issue on linux (epoll). The server
will be doing about 800k pv/day and growing...

We do really heavy httpclient access...

Does anyone think any of the above fixes will work?

David

David Novakovic

unread,
Jan 1, 2010, 4:10:33 AM1/1/10
to Tornado Web Server
Hey - any movement on this?

I've added the exception handlers suggested above, but still get some
nasty stuff in my logs and have to restart the servers.

D

On Dec 19 2009, 9:13 pm, David Novakovic <davidnovako...@gmail.com>
wrote:

> > > > andhttpclient.py that makes exception messages go away. I suspect


> > > > that Jacob is right as this may just hide the issue. I haven't
> > > > experienced the infinite loop Jacob is talking about yet. I hope Jacob
> > > > will be proved wrong as the code is going into production in a week :-
> > > > D.
>
> > > > Is there anybody who can look into this issue? Basically, I'm worried
> > > > why the authors didn't run into such issue themselves as they seeming
> > > > use the client in the same way.
>
> > > > Could it be an issue of curllib version? I'm using a bit later version
> > > > that the tornado web site's instructions mention.
>
> > > > Thanks,
> > > > Sergey
>
> > > > On Nov 6, 4:02 pm, Jacob Kristhammar <kristham...@gmail.com> wrote:
> > > > > Hi,
>
> > > > > The last couple of weeks we've been using Tornado and the async.

> > > > >httpclientto develop a server that juggles long-poll requests on one


> > > > > side and a lot of async requests on the other. This server was
> > > > > initially developed on a Mac and was hence using select and not epoll.
> > > > > When we moved the server to a Linux (Ubuntu) machine weird things
> > > > > started to happen in thehttpclient.
>

> > > > > Sometimes thehttpclientlooses track of some of the fd's used by

David Novakovic

unread,
Jan 1, 2010, 4:16:09 AM1/1/10
to Tornado Web Server
The error I'm getting is below - so outside of where the previous
exception handler fix is.

ERROR:root:Exception in callback <bound method
AsyncHTTPClient._perform of <tornado.httpclient.AsyncHTTPClient object
at 0x8487ccc>>


Traceback (most recent call last):

File "/home/bendaware/tweete/tornado/ioloop.py", line 238, in
_run_callback
callback()
File "/home/bendaware/tweete/tornado/httpclient.py", line 214, in
_perform
self.io_loop.remove_handler(fd)
File "/home/bendaware/tweete/tornado/ioloop.py", line 133, in
remove_handler
self._impl.unregister(fd)


I'll try adding a handler here and see how it goes.

D

On Dec 19 2009, 9:13 pm, David Novakovic <davidnovako...@gmail.com>
wrote:

> > > > andhttpclient.py that makes exception messages go away. I suspect


> > > > that Jacob is right as this may just hide the issue. I haven't
> > > > experienced the infinite loop Jacob is talking about yet. I hope Jacob
> > > > will be proved wrong as the code is going into production in a week :-
> > > > D.
>
> > > > Is there anybody who can look into this issue? Basically, I'm worried
> > > > why the authors didn't run into such issue themselves as they seeming
> > > > use the client in the same way.
>
> > > > Could it be an issue of curllib version? I'm using a bit later version
> > > > that the tornado web site's instructions mention.
>
> > > > Thanks,
> > > > Sergey
>
> > > > On Nov 6, 4:02 pm, Jacob Kristhammar <kristham...@gmail.com> wrote:
> > > > > Hi,
>
> > > > > The last couple of weeks we've been using Tornado and the async.

> > > > >httpclientto develop a server that juggles long-poll requests on one


> > > > > side and a lot of async requests on the other. This server was
> > > > > initially developed on a Mac and was hence using select and not epoll.
> > > > > When we moved the server to a Linux (Ubuntu) machine weird things
> > > > > started to happen in thehttpclient.
>

> > > > > Sometimes thehttpclientlooses track of some of the fd's used by

David P. Novakovic

unread,
Jan 1, 2010, 4:51:55 AM1/1/10
to Tornado Web Server
OK, Chucking that in a suitable try except helped.

As hinted above I guess it just makes the error messages go away, may not fix the actual issue..

D

Ben Darnell

unread,
Jan 15, 2010, 9:41:54 PM1/15/10
to python-...@googlegroups.com
I don't think there is a real problem that would be hidden by
swallowing the exception, so the right short-term fix is to catch both
OSError and IOError in IOLoop.remove_handler (Friendfeed's epoll
module throws OSError, while the one in the standard library throws
IOError). In the long run I think tornado's http client should
migrate from curl_multi_perform to curl_multi_socket_action (as in
Jacob's httpclient2.py:
http://github.com/sris/tornado/commits/master/tornado/httpclient2.py),
as socket_action is better suited to stateful polling objects
(epoll/kqueue). I've seen reports of bugs in socket_action in older
versions of libcurl, however, so we should probably proceed cautiously
on that front.

The underlying issue, for those of you following along, is that when
there is a timeout, libcurl will close the socket. On the next
iteration AsyncHTTPClient will notice that the socket is no longer in
curl_multi_fdset and unregister it with the epoll object. Since the
file descriptor is no longer valid, epoll throws an exception. We
have to remove the fd from epoll before closing it. There's no way to
do that with the perform api, but it is possible with socket_action.
The kernel internally removes closed file descriptors from any epoll
objects so we're not leaking resources by letting curl close the
socket before we remove it.

-Ben

Ben Darnell

unread,
Jan 15, 2010, 11:25:03 PM1/15/10
to python-...@googlegroups.com
On Fri, Jan 15, 2010 at 6:41 PM, Ben Darnell <ben.d...@gmail.com> wrote:
> I don't think there is a real problem that would be hidden by
> swallowing the exception, so the right short-term fix is to catch both
> OSError and IOError in IOLoop.remove_handler (Friendfeed's epoll

Likewise in httpclient.py:223, which happens if a socket is closed and
then its file descriptor is reused.

-Ben

Stephen Day

unread,
Jan 16, 2010, 1:15:52 AM1/16/10
to python-...@googlegroups.com
The underlying issue, for those of you following along, is that when
there is a timeout, libcurl will close the socket.  On the next
iteration AsyncHTTPClient will notice that the socket is no longer in
curl_multi_fdset and unregister it with the epoll object.  Since the 
file descriptor is no longer valid, epoll throws an exception.
 
This is basically what I said originally:

The problem arises from the fact that it loops around and continues to check the dead file descriptors, raising an exception each time. The IOError signals that it is no longer valid, so we ignore it and move on. The code below that does the accounting on this. Problem solved.

I don't think httpclient2 is the correct solution; curl_multi_socket_action is pretty new and most production operating systems may not even have a version of curl with a stable implementation (ie centos). I would suggest building an http client around tornado's ioloop to completely remove the pycurl dependency (wish I had the time).

I thought elephantum of github had the most elegant solution here:


He basically stated that friendfeed's epoll module throws the wrong exception.

On Fri, Jan 15, 2010 at 6:41 PM, Ben Darnell <ben.d...@gmail.com> wrote:

Ben Darnell

unread,
Jan 16, 2010, 3:34:57 AM1/16/10
to python-...@googlegroups.com
On Fri, Jan 15, 2010 at 10:15 PM, Stephen Day <stev...@gmail.com> wrote:
> I don't think httpclient2 is the correct solution; curl_multi_socket_action
> is pretty new and most production operating systems may not even have a
> version of curl with a stable implementation (ie centos). I would suggest
> building an http client around tornado's ioloop to completely remove the
> pycurl dependency (wish I had the time).

Agreed. Libcurl is such a pain in the neck. Replacing it is one of
those projects that seems easy enough that I'm surprised no one's done
it, but hard enough that I don't want to take it on myself. :)

> I thought elephantum of github had the most elegant solution here:
> http://github.com/facebook/tornado/issues#issue/32/comment/74334
> He basically stated that friendfeed's epoll module throws the wrong
> exception.

I'm not sure you can call it the "wrong" exception (especially since
the FF epoll module predates the one in python 2.6), but it was
definitely an oversight to treat the two as interchangeable without
verifying that their exception behavior is the same (at least for
exceptions important enough to be caught).

-Ben

Sergey Konozenko

unread,
Jan 16, 2010, 12:15:16 PM1/16/10
to python-...@googlegroups.com
Ben,

Thank you for clearing the issue. Really appreciated!

Sergey

thr...@googlemail.com

unread,
Jan 17, 2010, 12:21:16 PM1/17/10
to Tornado Web Server
I've followed this issue and, if it helps, as far as I can see:

#. http://gist.github.com/279452
#. http://gist.github.com/279453

Are the changes required to patch httpclient.py and ioloop.py to catch
both an IOError and OSError in the required places, as recommened by
Ben as a solution for the time being.

James.

On Jan 16, 5:15 pm, Sergey Konozenko <skonoze...@gmail.com> wrote:
> Ben,
>
> Thank you for clearing the issue. Really appreciated!
>
> Sergey
>

> On Sat, Jan 16, 2010 at 3:34 AM, Ben Darnell <ben.darn...@gmail.com> wrote:


> > On Fri, Jan 15, 2010 at 10:15 PM, Stephen Day <stevv...@gmail.com> wrote:
> > > I don't think httpclient2 is the correct solution;
> > curl_multi_socket_action
> > > is pretty new and most production operating systems may not even have a
> > > version of curl with a stable implementation (ie centos). I would suggest
> > > building an http client around tornado's ioloop to completely remove the
> > > pycurl dependency (wish I had the time).
>
> > Agreed.  Libcurl is such a pain in the neck.  Replacing it is one of
> > those projects that seems easy enough that I'm surprised no one's done
> > it, but hard enough that I don't want to take it on myself. :)
>
> > > I thought elephantum of github had the most elegant solution here:
> > >http://github.com/facebook/tornado/issues#issue/32/comment/74334
> > > He basically stated that friendfeed's epoll module throws the wrong
> > > exception.
>
> > I'm not sure you can call it the "wrong" exception (especially since
> > the FF epoll module predates the one in python 2.6), but it was
> > definitely an oversight to treat the two as interchangeable without
> > verifying that their exception behavior is the same (at least for
> > exceptions important enough to be caught).
>
> > -Ben
>

> > > On Fri, Jan 15, 2010 at 6:41 PM, Ben Darnell <ben.darn...@gmail.com>

> > >> > <davidnovako...@gmail.com>

> ...
>
> read more »

Stephen Day

unread,
Jan 17, 2010, 7:32:18 PM1/17/10
to python-...@googlegroups.com
I think I meant to put quotes around "wrong"... ;)

Bret Taylor

unread,
Jan 18, 2010, 12:14:08 AM1/18/10
to python-...@googlegroups.com
Thanks for the easy patch. I committed and ran my basic tests, but would love people to confirm this in fact fixes the issues:


Bret

Lorenzo Bolla

unread,
Oct 6, 2011, 9:37:40 AM10/6/11
to python-...@googlegroups.com
Hi all,

I'm resuming this old thread because I'm getting the same errors mentioned here with Tornado 2.1.1.
This is the typical stack trace:

Traceback (most recent call last):
  File "/home/lbolla/.virtualenvs/daryl/lib/python2.7/site-packages/tornado/curl_httpclient.py", line 105, in _handle_socket
    self.io_loop.update_handler(fd, ioloop_event)
  File "/home/lbolla/.virtualenvs/daryl/lib/python2.7/site-packages/tornado/ioloop.py", line 168, in update_handler
    self._impl.modify(fd, events | self.ERROR)

The fix was basically to catch both IOError and OSError in the "update_handler" call in httpclient.py.
I can't see a similar fix in Tornado 2.0 (or 2.1).

I can see, in Tornado2.0's ioloop.py that IO/OSErrors are similarly catched in "remove_handler", but not in "update_handler".
Is there any reason for this? Has this commit being lost "voluntarily"?

Thanks,
Lorenzo 

Ben Darnell

unread,
Oct 7, 2011, 12:25:48 AM10/7/11
to python-...@googlegroups.com
curl_httpclient was completely rewritten since that change (to use the
socket_action curl API instead of the older fdset/perform API), so
that's when the exception handler was lost. I'm not sure that
swallowing exceptions around update_handler is the right thing to do,
though. Did you see this problem in Tornado 2.0 or 2.1? It could be
a side effect of the EPOLLRDHUP change in 2.1.1, in which case the
right answer is to change how curl_httpclient reports connection
closures to libcurl. Can you identify a reproducible test case, or is
it just random?

-Ben

Lorenzo Bolla

unread,
Oct 7, 2011, 6:15:02 AM10/7/11
to python-...@googlegroups.com
Hi Ben,

The defect is not so easy to reproduce reliably.
I've put together the following script, that pings multiple times one of the most affected APIs: some of the requests are fine, others show this errors.
Try to run it a couple of times, and you should see the dreadful stack traces. Note, though, that all the callbacks are eventually called, so the exception raised does not seem to be that harmful.
By not using the "curl" client, the stack traces are not shown, but the client does not seem to call its callback at all.

L.


===

from tornado.ioloop import IOLoop
import random

from tornado.httpclient import AsyncHTTPClient
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")

counter = 0
max_c = 10

def handle_request(response):
    global counter
    if response.error:
        print "Error:", response.error
    else:
        counter += 1
        print counter
    if counter == max_c:
        IOLoop.instance().stop()

        for i in xrange(max_c)
        ]

client = AsyncHTTPClient()
for url in urls:
    client.fetch(url,
            handle_request,
            follow_redirects=True,
            max_redirects=5,
            )
IOLoop.instance().start()

===

Ben Darnell

unread,
Oct 7, 2011, 2:32:15 PM10/7/11
to python-...@googlegroups.com
OK, got it. It's not an epoll-specific issue, since I see it with
kqueue too (which means it wasn't caused by the 2.1.1 EPOLLRDHUP
change). It's a libcurl bug with following redirects; in fact it
looks like another variation of the same problem as
https://sourceforge.net/tracker/?func=detail&aid=3017819&group_id=976&atid=100976,
which was one of the reasons I gave up on libcurl and wrote
simple_httpclient instead. I recommend switching away from
curl_httpclient; if for some reason you cannot switch the simplest
workaround is probably to turn off follow_redirects and process
redirects yourself.

-Ben

Lorenzo Bolla

unread,
Oct 7, 2011, 5:24:29 PM10/7/11
to python-...@googlegroups.com
Thanks for the explanation, Ben: it is brilliant.
I decided to workaround this bug by switching to the "select.poll" implementation of IOLoop (and the curl client): this setup does not seem to suffer from this bug.

L.
Reply all
Reply to author
Forward
0 new messages