Recursion loop between websocket and iostream

513 views
Skip to first unread message

Justin

unread,
Apr 26, 2011, 2:33:58 PM4/26/11
to Tornado Web Server
Hi All,
This is my first project using Tornado... building a message service
with TornadIO + ZMQ. I'm using pyzmq's ioloop.
I have been running into this issue of a recursive loop between the
iostream and the websocket, and can't yet pinpoint why. Seems a client
connects fine and sends plenty of messages and receives fine. Someone
else connects and is fine for a bit then crash. I dont know if I'm not
handling a disconnect properly, or if something in a message is
causing a read error in the tornado code. Maybe someone can tell me
whats really happening here and if its something I'm not handling
properly...

# Start of the traceback...
[E 110426 01:45:28 ioloop:295] Exception in I/O handler for fd 22
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/pyzmq-2.1.4-py2.6-
linux-x86_64.egg/zmq/eventloop/ioloop.py", line 282, in start
self._handlers[fd](fd, events)
File "/usr/local/lib/python2.6/dist-packages/pyzmq-2.1.4-py2.6-
linux-x86_64.egg/zmq/eventloop/stack_context.py", line 133, in wrapped
callback(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 199, in _handle_events
self._handle_read()
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 258, in _handle_read
if self._read_from_buffer():
# START: here is the start of the block that loops recursively
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 325, in _read_from_buffer
self._consume(loc + delimiter_len))
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 230, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/stack_context.py", line 173, in wrapped
callback(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 193, in _on_end_delimiter
self._receive_message()
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 178, in _receive_message
self.stream.read_bytes(1, self._on_frame_type)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 149, in read_bytes
if self._read_from_buffer():
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 314, in _read_from_buffer
self._run_callback(callback, self._consume(num_bytes))
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 230, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/stack_context.py", line 173, in wrapped
callback(*args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 183, in _on_frame_type
self.stream.read_until("\xff", self._on_end_delimiter)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 133, in read_until
if self._read_from_buffer():
# Loop back to start (line 325 in iostream

Any help??????
Message has been deleted

Justin

unread,
Apr 26, 2011, 2:36:59 PM4/26/11
to Tornado Web Server
I should probably mention my versions. Im using the pypi tornado
1.2.1, tornadIO 0.0.4, zeromq 2.1.4, and pypi pyzmq, on ubuntu linux.

Mikhail Korobov

unread,
Apr 26, 2011, 3:00:35 PM4/26/11
to python-...@googlegroups.com
Hi Justin,

Do you have an exception logged before all these "Exception in I/O handler for fd 22" exceptions? I've observed similar behavior with tornadio if exceptions in certain places were unhandled (don't remember the exact situation). Fixing the original exception (that was an error in my code) fixes this problem for me.

Justin

unread,
Apr 26, 2011, 5:27:39 PM4/26/11
to Tornado Web Server
I was able to get some more logs. I wonder if this is being caused by
the logging module? It looks like the traceback happens once, and then
stacks up twice and happens, then 3 times, on and on.

Here is a snippet from my on_close() that I believe starts this:
# self._client_sub is a class that is holding my ZMQStream 'stream'
<code>
if self._client_sub and self._client_sub.stream:
try:
self._client_sub.stream.stop_on_recv()
self._client_sub.stream.stop_on_err()

except Exception, e:
logging.error("Failed to close stream and sub
socket on client disconnect", exc_info=True)
</code>

And here is the start of the traceback:

<code>
[E 110425 20:52:50 messgr:202] Failed to close stream and sub socket
on client disconnect
Traceback (most recent call last):
File "messgr.py", line 198, in on_close
try:
AttributeError: 'NoneType' object has no attribute 'stream'
Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 776, in emit
msg = self.format(record)
File "/usr/lib/python2.6/logging/__init__.py", line 654, in format
return fmt.format(record)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-py2.6.egg/
tornado/options.py", line 362, in format
record.exc_text = self.formatException(record.exc_info)
File "/usr/lib/python2.6/logging/__init__.py", line 416, in
formatException
traceback.print_exception(ei[0], ei[1], ei[2], None, sio)
File "/usr/lib/python2.6/traceback.py", line 125, in print_exception
print_tb(tb, limit, file)
File "/usr/lib/python2.6/traceback.py", line 69, in print_tb
line = linecache.getline(filename, lineno, f.f_globals)
File "/usr/lib/python2.6/linecache.py", line 14, in getline
lines = getlines(filename, module_globals)
File "/usr/lib/python2.6/linecache.py", line 40, in getlines
return updatecache(filename, module_globals)
RuntimeError: maximum recursion depth exceeded
Traceback (most recent call last):
Exception RuntimeError: 'maximum recursion depth exceeded while
calling a Python object' in <type 'exceptions.AttributeError'> ignored
[E 110425 20:57:04 websocket:168] Uncaught exception in /realtime/
websocket
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 165, in wrapper
return callback(*args, **kwargs)
File "build/bdist.linux-x86_64/egg/tornadio/persistent.py", line
80, in on_message
self.async_callback(self.connection.raw_message)(message)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 169, in wrapper
self._abort()
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/websocket.py", line 175, in _abort
self.stream.close()
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 180, in close
self._run_callback(self._close_callback)
File "/usr/local/lib/python2.6/dist-packages/tornado-1.2.1-
py2.6.egg/tornado/iostream.py", line 233, in _run_callback
exc_info=True)
File "/usr/lib/python2.6/logging/__init__.py", line 1456, in
error
root.error(*((msg,)+args), **kwargs)
File "/usr/lib/python2.6/logging/__init__.py", line 1082, in
error
self._log(ERROR, msg, args, **kwargs)
File "/usr/lib/python2.6/logging/__init__.py", line 1173, in
_log
self.handle(record)
File "/usr/lib/python2.6/logging/__init__.py", line 1183, in
handle
self.callHandlers(record)
File "/usr/lib/python2.6/logging/__init__.py", line 1220, in
callHandlers
hdlr.handle(record)
File "/usr/lib/python2.6/logging/__init__.py", line 679, in
handle
self.emit(record)
File "/usr/lib/python2.6/logging/__init__.py", line 804, in emit
self.handleError(record)
File "/usr/lib/python2.6/logging/__init__.py", line 733, in
handleError
traceback.print_exception(ei[0], ei[1], ei[2], None,
sys.stderr)
File "/usr/lib/python2.6/traceback.py", line 125, in
print_exception
print_tb(tb, limit, file)
File "/usr/lib/python2.6/traceback.py", line 57, in print_tb
if hasattr(sys, 'tracebacklimit'):
AttributeError: 'module' object has no attribute 'tracebacklimit'
# ... and repeat this again, 3 times...and then 4 times....
</code>

Ben Darnell

unread,
Apr 26, 2011, 10:52:02 PM4/26/11
to python-...@googlegroups.com
Python has a fairly small stack size, and small operations on IOStream can consume the stack pretty rapidly.  If you're sending lots (hundreds, I think) of small messages on one websocket connection, you could be legitimately hitting the stack limit.  I've been meaning to route IOStream callbacks through IOLoop.add_callback to address this issue.  

If this is happening when you're not sending many messages, then there's some other bug.  I don't know how zmq's fork of tornado.ioloop works, but the fact that you've got both zmq.eventloop.stack_context and tornado.stack_context in that trace is awfully suspicious.  

If the problem is just lots of small messages blowing out the stack, try this change.  I can't commit it as-is because it increases the chance of running out of file descriptors, but it should be a step in the right direction for the problem you're seeing.  

-Ben

diff --git a/tornado/iostream.py b/tornado/iostream.py
index 65f36ed..15d4fd7 100644
--- a/tornado/iostream.py
+++ b/tornado/iostream.py
@@ -20,6 +20,7 @@ from __future__ import with_statement
 
 import collections
 import errno
+import functools
 import logging
 import socket
 import sys
@@ -224,10 +225,7 @@ class IOStream(object):
 
     def _run_callback(self, callback, *args, **kwargs):
         try:
-            # Use a NullContext to ensure that all StackContexts are run
-            # inside our blanket exception handler rather than outside.
-            with stack_context.NullContext():
-                callback(*args, **kwargs)
+            self.io_loop.add_callback(functools.partial(callback, *args, **kwargs))
         except:
             logging.error("Uncaught exception, closing connection.",
                           exc_info=True)

Justin

unread,
Apr 27, 2011, 1:06:16 PM4/27/11
to Tornado Web Server
Awesome, thanks for the reply! I will try this out.

I dont think we are doing a huge volume of messages, but they are very
small messages each. The test case that this happens under is with 2
clients connected and about 50 messages per second combined. I think
the average is to have about 20 very small msgs per second per
connected websocket client.

As for the ioloop, I am importing it from zmq, and passing it to
tornadIO's routers, SocketServer, and any ZMQStreams. I looked through
the tornadIO code and it seems like it is handling me passing my
ioloop instance through to the router and server just fine (it checks
for my arg first and falls back on tornado ioloop).

Justin

unread,
Apr 27, 2011, 4:16:16 PM4/27/11
to Tornado Web Server
I tried that patch and it possibly has helped. We were able to have
about 20 clients running without an issue.
https://github.com/justinfx/tornado

Thanks!


On Apr 27, 10:06 am, Justin <justinisr...@gmail.com> wrote:

Ben Darnell

unread,
May 1, 2011, 3:58:36 PM5/1/11
to python-...@googlegroups.com
I've just committed an improved version of this patch that fixes up the error handling.

-Ben
Reply all
Reply to author
Forward
0 new messages