[I 091106 21:24:14 web:725] 302 POST /foo/ (127.0.0.1) 895.19ms
[D 091106 21:24:14 ioloop:133] Error deleting fd from IOLoop
Traceback (most recent call last):
File "/home/x/tornado/tornado/ioloop.py", line 131, in
remove_handler
self._impl.unregister(fd)
File "/home/x/tornado/tornado/ioloop.py", line 308, in unregister
epoll.epoll_ctl(self._epoll_fd, self._EPOLL_CTL_DEL, fd, 0)
OSError: [Errno 9] Bad file descriptor
-Elias
According to one of the libcurl devs epoll and curl_multi_perform is
not a good combo. Instead the newer curl_multi_socket_action.
I've created an initial version of the AsyncHTTPClient that uses
socket_action instead of perform.
So far I've had very good results and all of my problems described
before have disappeared.
The current version of this httpclient is available as httpclient2 on
my tornado fork http://github.com/sris/tornado/blob/04e67fa358b0ac6da04a14d4d64f406d0c3131df/tornado/httpclient2.py
It's still very immature and there's still work to be done so any
feedback and comments are very much appreciated.
Bret et al, any specific reason to why you guys chose to use
multi_perform and not socket_action. Maybe something i've overseen?
I ran into some issues with timeouts because of libcurl known bug #62:
CURLOPT_TIMEOUT does not work properly with the regular multi and
multi_socket interfaces. The work-around for apps is to simply
remove the
easy handle once the time is up. See also:
http://curl.haxx.se/bug/view.cgi?id=2501457
Personally I don't really need the request timeouts anyway so I've
currently replaced it with a parameter called idle_timeout that
instead uses the low_speed_limit/time and timeouts a connection if
there's no activity for the specified number of seconds. (the non-
async client still needs and works with request timeout hence i
removed it from httpclient2 and the standard httpclient version can be
used when needed.)
I will continue to develop this version of the httpclient and test it
is as much as possible.
-- Jacob
You're right that it will catch some errors but i still suspect that
the problem is rooted in epoll removing closed connections under
libcurls feet. I think it depends on a number of things such a
reference counts to the sockets in question etc. I'm not sure but
since I was unable to reproduce it I won't say too much.
What I do know is that it's not a recommended mix, epoll and
multi_perform (according to Daniel Stenberg of libcurl).
Before digging deeper into these issues we also patched the weak-spots
in the httpclient catching the thrown erros, what makes me nervous
though is that we at later state ran into an issue where libcurl was
getting stuck in an infinite loop caused be a bad fd. Still, I haven't
managed to fully reproduce this, but it was always happening after a
while of heavy httpclient usage.
I'd love to be proven wrong because it would make life much easier for
me, but simply catching these error doesn't seem to be the fix.
-- Jacob
We do really heavy httpclient access...
Does anyone think any of the above fixes will work?
David
I've added the exception handlers suggested above, but still get some
nasty stuff in my logs and have to restart the servers.
D
On Dec 19 2009, 9:13 pm, David Novakovic <davidnovako...@gmail.com>
wrote:
> > > > andhttpclient.py that makes exception messages go away. I suspect
> > > > that Jacob is right as this may just hide the issue. I haven't
> > > > experienced the infinite loop Jacob is talking about yet. I hope Jacob
> > > > will be proved wrong as the code is going into production in a week :-
> > > > D.
>
> > > > Is there anybody who can look into this issue? Basically, I'm worried
> > > > why the authors didn't run into such issue themselves as they seeming
> > > > use the client in the same way.
>
> > > > Could it be an issue of curllib version? I'm using a bit later version
> > > > that the tornado web site's instructions mention.
>
> > > > Thanks,
> > > > Sergey
>
> > > > On Nov 6, 4:02 pm, Jacob Kristhammar <kristham...@gmail.com> wrote:
> > > > > Hi,
>
> > > > > The last couple of weeks we've been using Tornado and the async.
> > > > >httpclientto develop a server that juggles long-poll requests on one
> > > > > side and a lot of async requests on the other. This server was
> > > > > initially developed on a Mac and was hence using select and not epoll.
> > > > > When we moved the server to a Linux (Ubuntu) machine weird things
> > > > > started to happen in thehttpclient.
>
> > > > > Sometimes thehttpclientlooses track of some of the fd's used by
ERROR:root:Exception in callback <bound method
AsyncHTTPClient._perform of <tornado.httpclient.AsyncHTTPClient object
at 0x8487ccc>>
Traceback (most recent call last):
File "/home/bendaware/tweete/tornado/ioloop.py", line 238, in
_run_callback
callback()
File "/home/bendaware/tweete/tornado/httpclient.py", line 214, in
_perform
self.io_loop.remove_handler(fd)
File "/home/bendaware/tweete/tornado/ioloop.py", line 133, in
remove_handler
self._impl.unregister(fd)
I'll try adding a handler here and see how it goes.
D
On Dec 19 2009, 9:13 pm, David Novakovic <davidnovako...@gmail.com>
wrote:
> > > > andhttpclient.py that makes exception messages go away. I suspect
> > > > that Jacob is right as this may just hide the issue. I haven't
> > > > experienced the infinite loop Jacob is talking about yet. I hope Jacob
> > > > will be proved wrong as the code is going into production in a week :-
> > > > D.
>
> > > > Is there anybody who can look into this issue? Basically, I'm worried
> > > > why the authors didn't run into such issue themselves as they seeming
> > > > use the client in the same way.
>
> > > > Could it be an issue of curllib version? I'm using a bit later version
> > > > that the tornado web site's instructions mention.
>
> > > > Thanks,
> > > > Sergey
>
> > > > On Nov 6, 4:02 pm, Jacob Kristhammar <kristham...@gmail.com> wrote:
> > > > > Hi,
>
> > > > > The last couple of weeks we've been using Tornado and the async.
> > > > >httpclientto develop a server that juggles long-poll requests on one
> > > > > side and a lot of async requests on the other. This server was
> > > > > initially developed on a Mac and was hence using select and not epoll.
> > > > > When we moved the server to a Linux (Ubuntu) machine weird things
> > > > > started to happen in thehttpclient.
>
> > > > > Sometimes thehttpclientlooses track of some of the fd's used by
The underlying issue, for those of you following along, is that when
there is a timeout, libcurl will close the socket. On the next
iteration AsyncHTTPClient will notice that the socket is no longer in
curl_multi_fdset and unregister it with the epoll object. Since the
file descriptor is no longer valid, epoll throws an exception. We
have to remove the fd from epoll before closing it. There's no way to
do that with the perform api, but it is possible with socket_action.
The kernel internally removes closed file descriptors from any epoll
objects so we're not leaking resources by letting curl close the
socket before we remove it.
-Ben
Likewise in httpclient.py:223, which happens if a socket is closed and
then its file descriptor is reused.
-Ben
The underlying issue, for those of you following along, is that when
there is a timeout, libcurl will close the socket. On the next
iteration AsyncHTTPClient will notice that the socket is no longer in
curl_multi_fdset and unregister it with the epoll object. Since the
file descriptor is no longer valid, epoll throws an exception.
The problem arises from the fact that it loops around and continues to check the dead file descriptors, raising an exception each time. The IOError signals that it is no longer valid, so we ignore it and move on. The code below that does the accounting on this. Problem solved.
Agreed. Libcurl is such a pain in the neck. Replacing it is one of
those projects that seems easy enough that I'm surprised no one's done
it, but hard enough that I don't want to take it on myself. :)
> I thought elephantum of github had the most elegant solution here:
> http://github.com/facebook/tornado/issues#issue/32/comment/74334
> He basically stated that friendfeed's epoll module throws the wrong
> exception.
I'm not sure you can call it the "wrong" exception (especially since
the FF epoll module predates the one in python 2.6), but it was
definitely an oversight to treat the two as interchangeable without
verifying that their exception behavior is the same (at least for
exceptions important enough to be caught).
-Ben
#. http://gist.github.com/279452
#. http://gist.github.com/279453
Are the changes required to patch httpclient.py and ioloop.py to catch
both an IOError and OSError in the required places, as recommened by
Ben as a solution for the time being.
James.
On Jan 16, 5:15 pm, Sergey Konozenko <skonoze...@gmail.com> wrote:
> Ben,
>
> Thank you for clearing the issue. Really appreciated!
>
> Sergey
>
> On Sat, Jan 16, 2010 at 3:34 AM, Ben Darnell <ben.darn...@gmail.com> wrote:
> > On Fri, Jan 15, 2010 at 10:15 PM, Stephen Day <stevv...@gmail.com> wrote:
> > > I don't think httpclient2 is the correct solution;
> > curl_multi_socket_action
> > > is pretty new and most production operating systems may not even have a
> > > version of curl with a stable implementation (ie centos). I would suggest
> > > building an http client around tornado's ioloop to completely remove the
> > > pycurl dependency (wish I had the time).
>
> > Agreed. Libcurl is such a pain in the neck. Replacing it is one of
> > those projects that seems easy enough that I'm surprised no one's done
> > it, but hard enough that I don't want to take it on myself. :)
>
> > > I thought elephantum of github had the most elegant solution here:
> > >http://github.com/facebook/tornado/issues#issue/32/comment/74334
> > > He basically stated that friendfeed's epoll module throws the wrong
> > > exception.
>
> > I'm not sure you can call it the "wrong" exception (especially since
> > the FF epoll module predates the one in python 2.6), but it was
> > definitely an oversight to treat the two as interchangeable without
> > verifying that their exception behavior is the same (at least for
> > exceptions important enough to be caught).
>
> > -Ben
>
> > > On Fri, Jan 15, 2010 at 6:41 PM, Ben Darnell <ben.darn...@gmail.com>
> > >> > <davidnovako...@gmail.com>
> ...
>
> read more »
-Ben
-Ben