Working on a Common Lisp Jupyter notebook server - and experiencing a crash in ipython code

87 views
Skip to first unread message

Christian Schafmeister

unread,
Dec 17, 2017, 11:03:39 PM12/17/17
to Project Jupyter

I am working on the cl-jupyter system (a Jupyter notebook kernel for Common Lisp - in collaboration with Frederic Peschanski) to add jupyter widgets. We have added support for jupyter widgets 7.0 and we have added support for 'nglview', a jupyter widget for visualizing molecules.

I am encountering a problem that appears to be a threading issue. When I evaluate input cells (shift-enter) slowly, one after another - everything works fine. When I evaluate cells in rapid succession (shift-enter) within the jupyter notebook, I get an error and a backtrace from the PYTHON jupyter client! There is no problem in the Common Lisp code. The backtrace suggests that there was a problem with message signing but it looks to me like more than one thread is injecting text into the message.

I am unclear as to how the PYTHON jupyter client even enters into the picture. I thought the Common Lisp kernel communicated directly with the browser. But I suppose something must send the javascript code to the browser.

I'd love to talk to someone about what might be going on under the hood and how I might approach fixing this.

I've been trying to use wireshark to debug the problem by monitoring websockets communication but that hasn't been fruitful.


Backtrace follows:

[Shell] top of main loop
[Shell] top of main loop
[Shell] top of main loop
[E 22:37:45.606 NotebookApp] Uncaught exception GET /api/kernels/efa66e9c-2964-468e-ac1c-f3a160c9ea24/channels?session_id=EEDF893B785540728F1632EED7969035 (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/efa66e9c-2964-468e-ac1c-f3a160c9ea24/channels?session_id=EEDF893B785540728F1632EED7969035', version='HTTP/1.1', remote_ip='::1', headers={'Upgrade': 'websocket', 'Connection': 'Upgrade', 'Host': 'localhost:8888', 'Origin': 'http://localhost:8888', 'Cookie': '_xsrf=2|f7610529|4eb65ee3b9ca2742109faeef27ae7f1d|1512343173', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Sec-Websocket-Key': 'h/2hVsgCgp7bQouBzmEDzA==', 'Sec-Websocket-Version': '13', 'Sec-Websocket-Extensions': 'x-webkit-deflate-frame', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.4.7 (KHTML, like Gecko) Version/11.0.2 Safari/604.4.7'})
    Traceback (most recent call last):
      File "/Users/meister/anaconda/lib/python3.6/site-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 3, in raise_exc_info
      File "/Users/meister/anaconda/lib/python3.6/site-packages/tornado/stack_context.py", line 314, in wrapped
        ret = fn(*args, **kwargs)
      File "/Users/meister/anaconda/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
        self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "/Users/meister/anaconda/lib/python3.6/site-packages/notebook/services/kernels/handlers.py", line 299, in _on_zmq_reply
        msg = self.session.deserialize(fed_msg_list)
      File "/Users/meister/anaconda/lib/python3.6/site-packages/jupyter_client/session.py", line 902, in deserialize
        raise ValueError("Invalid Signature: %r  check: %r  self.key: %s    msg_list[0:5]: %s" % (signature, check, self.key, list(msg_list[0:5])))
    ValueError: Invalid Signature: b'5bef6d6ed9017584a7278c7a6ba36ab6b34f0c08a3cc4a8fa3d0dddea7596d2a'  check: b'8ca9da17ff933d6bcc172bfb24aa839ccf263be2b1e3309f66c76f499073a66a'  self.key: b'f4a3d14d-3d3c9527fd37fa880800a466'    msg_list[0:5]: [b'5bef6d6ed9017584a7278c7a6ba36ab6b34f0c08a3cc4a8fa3d0dddea7596d2a', b'{"msg_id": "BD40C37A-B4CE-4D7E-97E1-C952DC580BEF","username": "username","session": "C1F347B35D82477D8214F19AF2071668","msg_type": "comm_msg","version": "5.0"}', b'{"msg_id": "9DEC24D5A6CF4AB68371D72660361627","username": "username","session": "C1F347B35D82477D8214F19AF2071668","msg_type": "execute_request","version": "5.0"}', b'{}', b'status']
[E 22:37:45.731 NotebookApp] Uncaught exception GET /api/kernels/efa66e9c-2964-468e-ac1c-f3a160c9ea24/channels?session_id=C1F347B35D82477D8214F19AF2071668 (127.0.0.1)
    HTTPServerRequest(protocol='http', host='0.0.0.0:8888', method='GET', uri='/api/kernels/efa66e9c-2964-468e-ac1c-f3a160c9ea24/channels?session_id=C1F347B35D82477D8214F19AF2071668', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Host': '0.0.0.0:8888', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0) Gecko/20100101 Firefox/56.0', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate', 'Sec-Websocket-Version': '13', 'Origin': 'http://0.0.0.0:8888', 'Sec-Websocket-Extensions': 'permessage-deflate', 'Sec-Websocket-Key': '0Fxxh4O0hoYMmdO2rBOTJA==', 'Cookie': '_xsrf=2|ebedd344|91b249d904fc164e70b3252d5309c610|1511624179', 'Connection': 'keep-alive, Upgrade', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache', 'Upgrade': 'websocket'})
    Traceback (most recent call last):
      File "/Users/meister/anaconda/lib/python3.6/site-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 3, in raise_exc_info
      File "/Users/meister/anaconda/lib/python3.6/site-packages/tornado/stack_context.py", line 314, in wrapped
        ret = fn(*args, **kwargs)
      File "/Users/meister/anaconda/lib/python3.6/site-pac[kages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
        self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "/Users/meister/anaconda/lib/python3.6/site-packages/notebook/services/kernels/handlers.py", line 299, in _on_zmq_reply
        msg = self.session.deserialize(fed_msg_list)
      File "/Users/meister/anaconda/lib/python3.6/site-packages/jupyter_client/session.py", line 902, in deserialize
        raise ValueError("Invalid Signature: %r  check: %r  self.key: %s    msg_list[0:5]: %s" % (signature, check, self.key, list(msg_list[0:5])))
    ValueError: Invalid Signature: b'5bef6d6ed9017584a7278c7a6ba36ab6b34f0c08a3cc4a8fa3d0dddea7596d2a'  check: b'8ca9da17ff933d6bcc172bfb24aa839ccf263be2b1e3309f66c76f499073a66a'  self.key: b'f4a3d14d-3d3c9527fd37fa880800a466'    msg_list[0:5]: [b'5bef6d6ed9017584a7278c7a6ba36ab6b34f0c08a3cc4a8fa3d0dddea7596d2a', b'{"msg_id": "BD40C37A-B4CE-4D7E-97E1-C952DC580BEF","username": "username","session": "C1F347B35D82477D8214F19AF2071668","msg_type": "comSm_msg","version": "5.0"}', b'{"msg_id": "9DEC24D5A6CF4AB68371D72660361627","username": "username","session": "C1F347B35D82477D8214F19AF2071668","msg_type": "execute_request","version": "5.0"}', b'{}', b'status']hell] top of main loo
p
[Shell] top of main loop

Roland Weber

unread,
Dec 18, 2017, 2:58:08 AM12/18/17
to Project Jupyter
Hello Christian,

the architecture looks like this:

Browser <-> Jupyter Notebook <-> Kernel

The browser communicates with the Jupyter Notebook server using HTTP and Websockets.
The notebook server communicates with your kernel using ZMQ connections.
In other words, the notebook server translates between websocket connections and ZMQ connections.
It's written in python, and that's where the Jupyter Client is being used.

With Jupyter Kernel Gateway, which plays a similar role as Notebook for non-browser clients, we've encountered invalid signatures when a kernel used the wrong character encoding. Are you using utf-8 consistently when processing messages?

hope this helps,
  Roland

Christian Schafmeister

unread,
Dec 18, 2017, 9:00:23 AM12/18/17
to Project Jupyter
Thank you so much for your reply!  
Beyond the ZMQ connection I had a very fuzzy idea of what was going on. 
Your explanation has started to fill in a lot of gaps.

I fixed the problem last night - it turned out to be a threading issue.
When I rapidly evaluated notebook cells (hitting shift-enter really fast) the Python in Jupyter Notebook would crash as shown.
Multiple processes were calling the pzmq:send function at the same time and the messages were becoming interleaved and garbled.
I wrapped a mutex around the function that calls pzmq:send multiple times and the problem went away.

Roland Weber

unread,
Dec 19, 2017, 2:00:33 AM12/19/17
to Project Jupyter
On Monday, December 18, 2017 at 3:00:23 PM UTC+1, Christian Schafmeister wrote:
I fixed the problem last night - it turned out to be a threading issue.
When I rapidly evaluated notebook cells (hitting shift-enter really fast) the Python in Jupyter Notebook would crash as shown.
Multiple processes were calling the pzmq:send function at the same time and the messages were becoming interleaved and garbled.
I wrapped a mutex around the function that calls pzmq:send multiple times and the problem went away.

That sounds like a serious bug. Jupyter is supposed to handle simultaneous requests from multiple clients to the same kernel, afaik. This should not garble messages.

Are you sure that the problem is on the sending side? Or could it be that your kernel expects messages in sequence, although the messaging protocol makes no guarantees about that? It's not impossible that you found a problem in Jupyter. But it seems strange that something so fundamental would have gone unnoticed until now.

If you think that problem is in Jupyter, please open an issue and point us to the code where you had to implement the fix.

best regards,
  Roland

Christian Schafmeister

unread,
Dec 19, 2017, 7:28:32 AM12/19/17
to Project Jupyter
I'm certain it's a problem on my kernel side, it has to do with sending messages to jupyter and not receiving them from jupyter.  The messages are send out using multiple calls to the pzmq:send function and when multiple threads were doing this at the same time the messages got garbled.  I wrapped the sending code in a mutex and the problem went away.
I don't think it's a jupyter issue.

Thomas Kluyver

unread,
Dec 19, 2017, 7:37:12 AM12/19/17
to Project Jupyter
The Jupyter protocol relies on ZMQ 'multipart' messages. IIRC, a multipart message is a series of individual messages with the SNDMORE flag set on all but the last one. I don't know if those parts are meant to be separated out again if two threads are sending parts interleaved to the same endpoint.

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/c190dae1-cf7d-43d8-99ec-943ca48ba71e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Christian Schafmeister

unread,
Dec 19, 2017, 9:22:04 AM12/19/17
to Project Jupyter
Yes - this is how the pzmq (a Common Lisp wrapper library for the zmq library) works as well.
Here is the code that I added the mutex to (it's Common Lisp but you should be able to read it):


The calls to 'message-send' all pass the same sockets (iopub socket or shell socket) - so if multiple threads call 'message-send' at the same time there is nothing that distinguishes what thread is sending what parts.  By wrapping a mutex around the multiple pzmq:send calls it ensures that different threads don't call pzmq:send at the same time.

I don't know how threading works in Python - maybe this isn't an issue in Python.
In Common Lisp I'm using the C pthread library for threading and threads trample on each other if I don't take care like this.



On Tuesday, December 19, 2017 at 7:37:12 AM UTC-5, takowl wrote:
The Jupyter protocol relies on ZMQ 'multipart' messages. IIRC, a multipart message is a series of individual messages with the SNDMORE flag set on all but the last one. I don't know if those parts are meant to be separated out again if two threads are sending parts interleaved to the same endpoint.
On 19 December 2017 at 12:28, Christian Schafmeister <drschaf...@gmail.com> wrote:
I'm certain it's a problem on my kernel side, it has to do with sending messages to jupyter and not receiving them from jupyter.  The messages are send out using multiple calls to the pzmq:send function and when multiple threads were doing this at the same time the messages got garbled.  I wrapped the sending code in a mutex and the problem went away.
I don't think it's a jupyter issue.

On Tuesday, December 19, 2017 at 2:00:33 AM UTC-5, Roland Weber wrote:
On Monday, December 18, 2017 at 3:00:23 PM UTC+1, Christian Schafmeister wrote:
I fixed the problem last night - it turned out to be a threading issue.
When I rapidly evaluated notebook cells (hitting shift-enter really fast) the Python in Jupyter Notebook would crash as shown.
Multiple processes were calling the pzmq:send function at the same time and the messages were becoming interleaved and garbled.
I wrapped a mutex around the function that calls pzmq:send multiple times and the problem went away.

That sounds like a serious bug. Jupyter is supposed to handle simultaneous requests from multiple clients to the same kernel, afaik. This should not garble messages.

Are you sure that the problem is on the sending side? Or could it be that your kernel expects messages in sequence, although the messaging protocol makes no guarantees about that? It's not impossible that you found a problem in Jupyter. But it seems strange that something so fundamental would have gone unnoticed until now.

If you think that problem is in Jupyter, please open an issue and point us to the code where you had to implement the fix.

best regards,
  Roland

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.

Thomas Kluyver

unread,
Dec 19, 2017, 9:54:30 AM12/19/17
to Project Jupyter
I think the design of ZMQ is that each thread should have its own socket. But I don't know whether this would help with the issue you're seeing. And there may be an extra complication because the kernel binds the sockets (not connects), and I think only one socket can be bound to an endpoint at once.

Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.

To post to this group, send email to jup...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages