We’ve just released jupyter-client 5.1, including a definition of version 5.2 of the Jupyter protocol.
The main issue in protocol 5.2 is resolving ambiguity about the cursor_pos field in inspection/completion messages in the presence of non-BMP (“astral plane”) characters. The protocol has always defined cursor_pos in terms of unicode characters, but due to widespread bugs in frontends (all known javascript-based frontends and some Python frontends when running with Python 2 on certain platforms) not handling surrogate pairs correctly, cursor_pos has been ambiguous in practice until now. Given the prevalence of the bug, Kernel authors may choose to interpret cursor_pos coming from protocol version 5.1 or earlier as the UTF-16 code unit offset, rather than the encoding-independent code point offset, because that would be the correct behavior for the most widespread frontend (jupyter notebook). This would not technically be a correct implementation of protocol v5.1, but if your users mostly use the notebook it would match your frontend.
Kernel and frontend authors should check the length of a string containing codepoints >= 0x10000, such as ‘𝐚’. If the length of such a single-character is two, you will need to translate between string index and unicode index for cursor_pos. Only once this has been addressed should your kernel or frontend self-identify as implementing protocol version 5.2.
Notebook 5.1 (soon) and JupyterLab 0.24 (out) implement this correctly and identify as protocol 5.2.
-Min