How soon is too soon to send a message to the shell channel?

48 views
Skip to first unread message

Trevor Murphy

unread,
Feb 8, 2018, 5:49:47 PM2/8/18
to Project Jupyter
I'm playing around with a hand-rolled jupyter client, mainly to learn ZMQ and the jupyter messaging protocol.

My "client" starts a jupyter console and then starts sending messages to the shell channel / receiving from the shell and iopub channels.

I notice that, often, I'm unable to read from the shell and iopub channels if I send a message as soon as possible after the console program starts.  Like, my code does:

  1. shell out a jupyter console, wait for return
  2. poll the shell socket's ZMQ_EVENTS option until ZMQ_POLLOUT is true (incidentally, this always succeeds on the first poll)
  3. send a message to the shell socket
  4. poll the shell and iopub sockets' ZMQ_EVENTS options until ZMQ_POLLIN is true (both polls spin forever).

If I make the code wait for a few seconds, the send and receive roundtrip completes just fine, a/k/a the polls at (4) succeed and my code continues.

Can anybody point me to debug this?  I tried starting the console with --log-level=DEBUG but none of the messages talk about what's going on at this level.

Or, am I just doing the poll wrong?  I would've though it would be fine to send when ZMQ_POLLOUT is true, but it looks like the kernel isn't receiving my message in this case.

MinRK

unread,
Feb 12, 2018, 10:03:46 AM2/12/18
to Project Jupyter

You’re doing everything right, there’s just some zmq magic that’s getting in the way of things behaving clearly. The short answer to your question is that it’s never to early to send a shell request. zmq is ‘connectionless’ which is the zmq way of saying that you can send messages even when the other end hasn’t shown up yet. It will handle delivering the message when the kernel shows up. This is why POLLOUT is true immediately. What you are likely running into is a failure to propagate subscriptions on the PUB/SUB channel. If a PUB socket sends a message and it has no registered subscribers, it will discard those messages immediately. And propagating those subscriptions takes a finite amount of time.

So the common failure is:

  1. request kernel start
  2. send request immediately
  3. kernel starts, binds, handles request, sends replies
  4. PUB/SUB subscriptions haven’t propagated, so PUB messages are discarded
  5. zmq PUB/SUB subscriptions finish propagating (too late!)

The way we deal with this is explicitly waiting for IOPub messages to be delivered before sending requests. This is implemented in BlockingKernelClient.wait_for_ready. But the logic is:

  1. send kernel_info_request
  2. wait for reply
  3. wait for status: idle on IOPub

if reply came and idle didn’t, IOPub subscriptions may not have propagated, run again. This typically takes ~milliseconds, so the second try will always work. Here’s a script that starts a kernel and connects a client to it and runs a bit of code.

-Min

--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+unsubscribe@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/20e29f70-cb0f-4c11-84a6-a94866b72098%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Trevor Murphy

unread,
Mar 13, 2018, 6:11:01 PM3/13/18
to Project Jupyter
I tried sending the following over email, but I guess it didn't post correctly.  Apologies for spam if this is your second time seeing my email.

> What you are likely running into is a failure to propagate subscriptions on the PUB/SUB channel.

Thanks, this was spot on.  I added a millisecond wait time and that fixed the problem (as fixed as I cared about for a toy project).

Now I'm curious, though.  I see that there's a ZMQ Socket option ZMQ_INVERT_MATCHING which would cause the PUB socket to push all messages to all new SUB connections, iiuc.  This behavior also seems to fit the Jupyter messaging model as, per the docs, the IOPub topics are irrelevant and completely ignored because frontends just subscribe to all topics.

If I wanted to hack the Jupyter kernel code to set ZMQ_INVERT_MATCHING on the IOPub PUB side, where would I go do that?
Reply all
Reply to author
Forward
0 new messages