Efficient Protocols and Serialisation with a View for High Performance Hard Real Time Analytics

Suminda Dharmasena

unread,

Nov 15, 2014, 1:17:27 AM11/15/14

to jup...@googlegroups.com

Hi,

Something I have been parping for a while that you guys might consider. Is it possible to consider highly efficient protocols and serialisation as default for all implementers. JSON and TCP can be fall back options. Perhaps you can consider the following as default:

Networking: Aeron (https://github.com/real-logic/Aeron)
Serialisation between languages: Cap'n Proto (http://kentonv.github.io/capnproto/), FlatBuffers (http://google.github.io/flatbuffers/), SBE (https://github.com/real-logic/simple-binary-encoding) and Msgpack (http://msgpack.org/) with JSON being a fall back
Compression of messages
Being able to have alternatives to 0MQ like Chronicle (https://github.com/OpenHFT/Chronicle-Queue) and any other schemes pluged in

This will help using this in hard real time analytics like in finance especially HFT.

Suminda

Min RK

unread,

Nov 15, 2014, 2:13:24 AM11/15/14

to jup...@googlegroups.com

These have been considered, actually, and JSON + ZeroMQ work quite well within the applicable scope of IPython / Jupyter. IPython does support alternative serialization, such as msgpack or protocol buffers, though this rarely has a noticeable performance impact. No other kernels bother to support this that I know of, because it isn't especially useful in interactive sessions, etc. It is only commonly used in IPython.parallel, where the connection files include serialization information, and high throughput of messages starts to matter, but even there, the effect it small.

Transports other than zeromq via tcp or ipc are not planned to be supported, and I don't imagine the added complexity would provide much benefit.

What environments are you thinking of where this level of performance is relevant? You mention HFT, but I don't see how latency between the IPython user and their kernel would be relevant.

-MinRK

Suminda

Suminda Dharmasena

unread,

Nov 15, 2014, 2:29:43 AM11/15/14

to Min RK, jup...@googlegroups.com

I am thinking when one cell is in one language and the other is in another language. You design your algo using a notebook and also use the same thing to go live. Also perhaps you can consider Nanomsg (https://github.com/nanomsg/nanomsg) as a replacement for 0MQ for licesing reasons? On top of this is the transport layer is native to the language used this will speed up things where you do not have to go through a FFI or something of that sort.

--
You received this message because you are subscribed to a topic in the Google Groups "Project Jupyter" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jupyter/qnufIxZ1e9E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/cf74b64c-a524-4d85-abaf-a94dc41791ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Doug Blank

unread,

Nov 15, 2014, 10:46:15 AM11/15/14

to jup...@googlegroups.com

Suminda,

You might be interested in the discussion of Subkernels of the metakernel project:

https://github.com/blink1073/metakernel/issues/41

Metakernel uses the IPython infrastructure (and Python) for kernels and parallel computing in other languages. We've discussed use of another serialization method other than JSON for intra-language communication for a reorganization called Subkernels. Daniel Mendler (minad) has started a prototype, but we're largely in the discussion stage.

I suspect that Min is right: all of the complexity probably won't be worth any small gains in performance. But, perhaps if one is exchanging lots of data frequently between languages, it might be worth it. But that sounds like a complex solution before the problem has even been demonstrated/measured. In any event, you are welcomed to join the discussion and prototyping of Subkernels.

-Doug

Suminda Dharmasena

unread,

Nov 17, 2014, 2:16:45 AM11/17/14

to jup...@googlegroups.com

Does sub kernels have its own list as it looks like an individual project? Or is it fine to discuss this here. If you are looking at streaming data at a high volume and code is in different languages you will be crossing the language boundaries very often.

Since Jupyter is looking to give equal citizenship to all languages may be the infrastructure also can be in different languages. This will not happen over night but best is to have a short concise spec on how this can be done. Ideally this should not be too long.

Doug Blank

unread,

Nov 17, 2014, 4:27:32 PM11/17/14

to jup...@googlegroups.com

On Monday, November 17, 2014 2:16:45 AM UTC-5, Suminda Dharmasena wrote:

Does sub kernels have its own list as it looks like an individual project? Or is it fine to discuss this here. If you are looking at streaming data at a high volume and code is in different languages you will be crossing the language boundaries very often.

I think the idea is that it could be integrated into metakernel, as it has magics for starting and communicating a single kernel (%kernel module ClassName, %kx, and %%kx) and clusters of kernels (%parallel module ClassName, %px, and %%px). But it could well spin off into another project. On-line docs for metakernel magics:

https://github.com/blink1073/metakernel/blob/master/metakernel/magics/README.md

Since Jupyter is looking to give equal citizenship to all languages may be the infrastructure also can be in different languages. This will not happen over night but best is to have a short concise spec on how this can be done. Ideally this should not be too long.