Engineering Notebook - Jupytext The Leonine Way

72 views
Skip to first unread message

Thomas Passin

unread,
Oct 26, 2024, 9:03:34 AM10/26/24
to leo-editor
This is a long post so I'm putting a summary at the start.

1. Leo can provide a very good way for a user to create and edit Juptyer notebooks.
2. The exploratory work for @jupytext files does not provide a proper Leonine experience.
3. The usual Leo approach is weak when it comes to working with documentation, and documentation mixed with code, in contrast to progams, itemized lists, and the like where Leo is strong.
4. This weaknesss can be overcome and there is a discussion of how and why, and some design suggestions.

There has been a flurry of activity in the last few weeks to add an ability for working with Jupyter notebooks by means of an intermediate file format.  That format is provided by the JupyText program.  However, in the rush no one seems to have thought much about what Leo can bring to the table, or why anyone would want to use Leo for this purpose, compared let's say with the Jupyter plugin for Visual Studio Code, which seems very good and very readable to me.

For those who don't know yet, a Jupyter notebook's file format is JSON.  JSON was designed to interchange data, including program structures. It was not designed for documentation or readability. Jupyter notebooks contain a sequence of "cells" - basically nodes - that are either Markdown text or code cells. There is no other structure. Jupytext's contribution is to flatten the nested JSON data structure to a flat text format with what amounts to sentinels - each line is commented out, and there are a few specially-formatted comments. One of these special comment lines marks the start of a Markdown cell, and another marks the start of a Python cell.

Leo is very strong in these areas:

    1. Structure is indicated by and can be changed using the outline;

    2. Only the contents of a single node are visible and editable at one time.  This is excellent for concentrating on programming and itemized lists of all kinds.

    3. Any markup such as Leo's sentinels that are needed to support structure or other Leo features are hidden from the user.  There's hardly any visual clutter. IMO this is a crucial feature that Leo offers.  It makes writing code to save and restore outlines and external files very complicated, but the user doesn't need to know about that.

Leo is not nearly as capable in supporting writing and documentation. That's because even though structure is important, it's also important to be able to see and edit the flow of the document from node to node and how nearby parts work together. An example is creating documentation using the rst3 command and Sphinx.  The mechanics of this process are excellent but doing editing beyond the node level requires a lot of mental effort and trial runs with Sphinx.  The Viewrendered3 plugin is designed to help with this problem by letting the user view an entire subtree.

Jupyter notebooks are serial combinations of documentation and code. Basically they are a limited form of "literate programming".  The code cells are usually small enough that they don't need to be structured.  They are displayed by a Jupyter-viewing program in a clean, very readable way.

The Jupytext work over the last few weeks presents the user with a direct view of the Jupytext-formatted file, sentinels, embedded comments, and all.  A user has to hand-edit the file while being distracted by the extra markup and not being able to see the flow of the parts one into another.  Yes, parts can be moved around and navigated to using the outline, but the editing experience is inferior.  There is also no syntax highlighting for code nodes - since the code is all commented out - nor can even rudimentary syntax checks be carried out.  The appearance of the document as it would be viewed in a Jupyter program - well, it is unknown until the file is saved to the .ipynb form and reloaded into Jupyter.

The Leonine Way
------------------
1.  Leo should present a view of a Jupyter notebook file to the user without sentinels or visual clutter, just like it does with other external file types.

2. Leo should be able to recreate the file's structure on reloading, or at least a close approximation to it.

3. Code cells should be syntax-highlighted and preferably able to be at least syntax-checkable.

4. Code execution would be a bonus but not required.

5. The user should need to know a minimum of special forms such as directives or other special markup features, and if there are any they should have a form similar to other Leo forms.

6. There should be a way to show a view of adjacent or nearby nodes so that the user can make sure they work together as intended. Showing a fully rendered view of nodes' Markdown is a desirable bonus to avoid round-tripping to Jupyter.

7. There should be an easy way to extend the file handling and rendering capabilities to use other programming languages besides Python.

6. The process of converting, importing, and exporting the files should be invisible to the user, just as for any other external file that Leo currently supports.

Thoughts on Design
-------------------
Leo already has parts that parts of the items above. The rst3 command gives a way to maintain structure apart from embedding it into an external file. With rst3, a Sphinx document becomes an ordinary Leo tree, not an external file. Running the command creates file(s) in the format that Sphinx needs. A "jupyter" command would do a similar job, and it would be far simpler.

VR3 is designed to handle 4, 5, and 6.  For example, a Jupytext file starts a new markdown cell with this line:

    # %% [markdown]

A code cell is started by:

    # %%

VR3 can recognize blocks delineated with @language directives:

@language md
This starts a block of markdown.

@language python
# Python code goes here

To get a jupytext file to render in VR3, just run a text substitution for these forms and then remove the remaining leading "#" characters.  That can be done during import of a notebook file, and doing it will give the user a clean view of the contents without visual clutter.  If we're going to have code to convert for VR3 to render we might as well have the same code up-front at import time.

The most challenging task will be how to merge changes that someone makes inside Jupyter with the internal arrangement of nodes inside a Leo outline. I don't think that will be easy, but it's essentially the same problem that has already been solved for @clean external files.

Thomas Passin

unread,
Oct 26, 2024, 9:14:48 AM10/26/24
to leo-editor
I see that I forgot that the Python code in a Jupytext file is already uncommented (shucks, I knew that yesterday!) So we do get syntax highlighting as is. All the rest still applies.

Edward K. Ream

unread,
Oct 26, 2024, 4:55:59 PM10/26/24
to leo-e...@googlegroups.com
On Sat, Oct 26, 2024 at 8:03 AM Thomas Passin <tbp1...@gmail.com> wrote:

This is a long post so I'm putting a summary at the start.

1. Leo can provide a very good way for a user to create and edit Juptyer notebooks.

@jupytext already does.

2. The exploratory work for @jupytext files does not provide a proper Leonine experience.

I have no idea why you assert something so obviously incorrect:

- @jupytext is a thin wrapper around @clean, so by definition @jupytext provides a Leonine experience.
- @jupytext works well despite the limitations of Jupyter notebooks.
- @jupytext already passes the acid test: round-tripping works well.

Edward

HaveF HaveF

unread,
Oct 26, 2024, 10:53:34 PM10/26/24
to leo-editor
6. There should be a way to show a view of adjacent or nearby nodes so that the user can make sure they work together as intended. Showing a fully rendered view of nodes' Markdown is a desirable bonus to avoid round-tripping to Jupyter.

Thank you Thomas for such a detailed thought.

I prepared a post yesterday to promote Leo in other communities. Here is the link to my post (I put it on github issues so that if you think there is something wrong, you can modify it directly). This post roughly describes my main use case - just using Leo to organize the logic in Jupyter ipynb files.

I am a relatively heavy user of JupyterLab. Because JupyterLab provides real-time code completion, I basically don't write ipynb code in Leo, so editing and rendering are not a big problem (of course, it's good if there is).

Of course, I think what you said `round-tripping` may be a problem. It is possible that users will feel a little messy when switching back and forth between Leo and JupyterLab. But I think this problem is not easy to solve. Considering that the return on investment may not be large enough, I am willing to endure this problem. In general, I am quite satisfied with the way I imagine using Jupytext.


Thomas Passin

unread,
Oct 27, 2024, 12:37:12 AM10/27/24
to leo-editor
On Saturday, October 26, 2024 at 10:53:34 PM UTC-4 iamap...@gmail.com wrote:
6. There should be a way to show a view of adjacent or nearby nodes so that the user can make sure they work together as intended. Showing a fully rendered view of nodes' Markdown is a desirable bonus to avoid round-tripping to Jupyter.

Thank you Thomas for such a detailed thought.

: )
 
Of course, I think what you said `round-tripping` may be a problem. It is possible that users will feel a little messy when switching back and forth between Leo and JupyterLab. But I think this problem is not easy to solve. Considering that the return on investment may not be large enough, I am willing to endure this problem. In general, I am quite satisfied with the way I imagine using Jupytext.

I didn't mean that round-tripping could be a problem.  I meant that it imposes extra steps, incurs a time delay, and adds distraction.  These things get in the way of the work of composing and thinking out a notebook.  As an example, here is how I typically develop Sphinx documents, which use ReStructuredText and the rst3 command. I will draft up a partial structure in Leo, then work on certain nodes. From time to time I will look at them with VR3 to catch any RsT syntax errors and make sure that the appearance seems right.  Also I find that viewing a rendered view helps me to catch typos and other editorial errors, and even errors in flow and thinking.

Next I will look using VR3  at a subtree that contains the nodes I am working on.  I may even export the rendering to the browser. At this point I may execute a command that runs RsT and then Sphinx against the RsT files.  I view the Sphinx output in the browser.  Then repeat.  This kind of workflow is how I think users will be able to make the best use of Leo. It's fast and fairly painless, and helps me to focus on what I'm writing and not on the  mechanics and distraction of constantly going back and forth between Leo and Sphinx (I'm sure the same will apply to Jupyter).

Let me extend my view of working with Jupytext files in Leo. Code cells could be checked for syntax errors by running py_compile against a code node. Right there you have one less round-trip to Jupyter.  Get rid of the leading "#" characters and "# %%" prefixes.  The hash characters aren't necessary and add visual clutter.  The conversion I do to make jupytext file usable by VR3 is to replace each "# %%" line with "@language md". Each line that heads a cell with "# %% [python] gets replaced by "@language python" That's the syntax that VR3 uses, and you can substitute any language name that VR3 knows about, so by going this route we would already be able to expand to non-python languages.  Directives like these are already familiar to Leo users and are not displayed by VR3.  Next I remove all the other leading "#" characters.  With a few other simple textual changes we can get a very clean, easy to read and write set of nodes.

Since I would have to write code to make these simple substitutions for VR3 to render a notebook, why not have them up front in the conversion process?  They can easily be reversed when the file is written.

Regardless of what one's views are about removing the "#"  and "# %%" formatting characters, in my experience streamlining the whole editing and review process, seeing nodes in their surrounding context, and reducing the number of round trips are the keys to a good, efficient Leo experience.

And, BTW, VR3 is capable of extracting the code blocks from the notebook text and viewing just the code.  I don't have a built-in way to export only the code just now (it wouldn't be hard), but there's a little trick by which it can be done.  This is handy when you want to create a stand-alone program but still want to enjoy the advantages of a notebook.

I know I'm emphasizing the potential role of VR3, probably too much, but I have found it to work synergistically with Leo's trees and nodes for developing documentation and notebooks.  I'm trying to pass along my experience in the hopes of making for a better experience in working with Jupyter projects.

I have attached the HTML output of the VR3 rendering of a fragment of one of my notebooks. I bet it will remind you of some other kinds of notebooks. Even though VR3 is pretty simpleminded and doesn't have anywhere near the capabilities as Jupyter, it can get a lot of good work done.  The graphic was created by executing the code in the notebook.  I expect working with jupytext files to be at least as good - if we can make the process smooth and painless.
vr3-notebook-example.zip

HaveF HaveF

unread,
Oct 27, 2024, 5:23:23 AM10/27/24
to leo-editor
I didn't mean that round-tripping could be a problem.  I meant that it imposes extra steps, incurs a time delay, and adds distraction.  These things get in the way of the work of composing and thinking out a notebook.  As an example, here is how I typically develop Sphinx documents, which use ReStructuredText and the rst3 command. I will draft up a partial structure in Leo, then work on certain nodes. From time to time I will look at them with VR3 to catch any RsT syntax errors and make sure that the appearance seems right.  Also I find that viewing a rendered view helps me to catch typos and other editorial errors, and even errors in flow and thinking.

Hi Thomas Thanks for your detailed explanation. I believe I understand what you mean: you want to create a smoother workflow, which is the real Leonine in your heart. Haha, I can see how deep your feelings for Leo are.

I agree that a smoother workflow will bring a very good experience and allow people to experience the flow state when using Leo. My main concerns are, 1. This will increase the code workload and additional maintenance burden, 2. If no one uses this feature, just like the previous ipython bridge, no one will find that it doesn't work. Build it means waste of time.

So I think, if more people use Leo and Jupyter/Jupytext, then it makes sense to add it, if not many people use it, keeping a basic rendering may be enough. What do you think?


Edward K. Ream

unread,
Oct 27, 2024, 6:57:18 AM10/27/24
to leo-editor
On Saturday, October 26, 2024 at 8:03:34 AM UTC-5 tbp1...@gmail.com wrote:

There has been a flurry of activity in the last few weeks to add an ability for working with Jupyter notebooks by means of an intermediate file format.  That format is provided by the jupytext program.  
However, in the rush no one seems to have thought much about what Leo can bring to the table, or why anyone would want to use Leo for this purpose, compared let's say with the Jupyter plugin for Visual Studio Code, which seems very good and very readable to me.

Huh? @jupytext makes it possible to organize Jupyter Notebooks with Leo outlines. What more do you want?

For those who don't know yet, a Jupyter notebook's file format is JSON.  JSON was designed to interchange data, including program structures. It was not designed for documentation or readability.

Yes, and @jupyter mostly hides the details.
 
 Any markup such as Leo's sentinels that are needed to support structure or other Leo features are hidden from the user.  There's hardly any visual clutter. IMO this is a crucial feature that Leo offers.  It makes writing code to save and restore outlines and external files very complicated, but the user doesn't need to know about that.

I agree.

Leo is not nearly as capable in supporting writing and documentation.

Huh? What about LeoDocs.leo??
 
The Leonine Way
------------------
1.  Leo should present a view of a Jupyter notebook file to the user without sentinels or visual clutter, just like it does with other external file types.

@jupytext does this.

2. Leo should be able to recreate the file's structure on reloading, or at least a close approximation to it.

@jupytext does this. @jupytext uses the @clean update algorithm.

3. Code cells should be syntax-highlighted and preferably able to be at least syntax-checkable.

That could be done, but it wouldn't be easy. I'm not interested.

4. Code execution would be a bonus but not required.

 This is a misguided goal. Leo cannot even closely approximate the environment provided by Jupyter Notebook. Attempting to do so would be a waste of time.

5. The user should need to know a minimum of special forms such as directives or other special markup features, and if there are any they should have a form similar to other Leo forms.

The user needs to know about Jupyter notebooks.

6. There should be a way to show a view of adjacent or nearby nodes so that the user can make sure they work together as intended. Showing a fully rendered view of nodes' Markdown is a desirable bonus to avoid round-tripping to Jupyter.

VR3 is free to do anything it likes.

7. There should be an easy way to extend the file handling and rendering capabilities to use other programming languages besides Python.

The only available Leo setting is  @string jupytext-fmt = py:percent.
The jupytext documentation is far from clear on the meaning of this option.

6. The process of converting, importing, and exporting the files should be invisible to the user, just as for any other external file that Leo currently supports.

It already is.

Edward

Offray Vladimir Luna Cárdenas

unread,
Nov 6, 2024, 1:01:15 PM11/6/24
to leo-e...@googlegroups.com

Hi,

On 26/10/24 8:03, Thomas Passin wrote:

There has been a flurry of activity in the last few weeks to add an ability for working with Jupyter notebooks by means of an intermediate file format.  That format is provided by the JupyText program.  However, in the rush no one seems to have thought much about what Leo can bring to the table, or why anyone would want to use Leo for this purpose, compared let's say with the Jupyter plugin for Visual Studio Code, which seems very good and very readable to me.

[...]

I have enjoyed this thread pretty much, as I was one of the first advocates of a combination of the features of Jupyter/IPython regarding interactivity and Leo regarding a self-referential document tree programmable inside itself. So much that I created my own outliner, Grafoscopio, in Pharo Smalltalk as I have told before. My response would be related with how those combinations and alternatives beyond Jupyter (but in "conversations"/inspirations with it) can happen in other tech stacks and communities, with some links for those who arrive at the conversation newly or what to go deeper into what those other technologies offer to get inspiration and crosspollination over here.

As a teaser, here is a screenshot of an Jupyter notebook (at the rigth), side by side with the importer that is being ran inside a Lepiter [1] notebook (at the left), inside the Glamorous Toolkit[1a] environment, powered by Pharo[1b].

[1] https://lepiter.io/feenk/introducing-lepiter--knowledge-management--e2p6apqsz5npq7m4xte0kkywn/
[1a] https://gtoolkit.com/
[1b] https://pharo.org/

It is interesting to see how interactive notebooks are evolving in several languages and environments, with Jupyter being the defacto standard that other communities try to connect with or overcome, with new designs and possibilities. Some are single language notebooks, like Elixir's Live Book[2], Clojure's Clerk[2a] or even my own Pharo's Grafoscopio that try to showcase advantages of embracing the features of those particular computer languages/environments. Others are multilingual, like Lepiter or Nextjournal[2c], that support Jupyter import and Python coding, while being build in other core technology stack to introduce features like malleability[3] or real time collaboration. In the same vein, I wonder how Leo could bring a new writing experience to interactive documents, beyond reading/writing compatibility with Jupyter?

[2] https://livebook.dev/
[2a] https://clerk.vision/
[2b] https://mutabit.com/grafoscopio/en.html
[2c] https://nextjournal.com/
[3] https://malleable.systems/



For those who don't know yet, a Jupyter notebook's file format is JSON.  JSON was designed to interchange data, including program structures. It was not designed for documentation or readability. Jupyter notebooks contain a sequence of "cells" - basically nodes - that are either Markdown text or code cells. There is no other structure. Jupytext's contribution is to flatten the nested JSON data structure to a flat text format with what amounts to sentinels - each line is commented out, and there are a few specially-formatted comments. One of these special comment lines marks the start of a Markdown cell, and another marks the start of a Python cell.

Leo is very strong in these areas:

[...]

The usage of JSON for document storage, while understandable (in Jupyter and Lepiter, for example), is unfortunate and I like the path we followed regarding human/diff friendly formats that we explored with Grafoscopio (using STON[4] with embedded Markdown inside, at the beginning), Live Book (with their own Markdown variant --yes another one!), Clerk, that is just plain Clojure code in a namespace. Since the launching of Lepiter in 2021, we have been migrating Grafoscopio from "plain" Pharo to GToolkit/Lepiter, and I choose a Markdeep[4a] with extended metadata. This allowed us to exchange and publish data stories like this[4b] or this[4c], with advantages over all other formats, particularly ipynb notebooks as the explained here[4d]. If you see the source code of [4b] or [4c], it's just a web renderable Markdown variant (another one!, closer to the original) inside HTML divs with STON metadata in them.

[4] https://github.com/svenvc/ston/
[4a] https://casual-effects.com/markdeep/
[4b] https://mutabit.com/repos.fossil/gig/doc/trunk/wiki/en/gig-portable-wiki--1apbv.md.html
[4c] https://mutabit.com/repos.fossil/malleable-systems/doc/trunk/wiki/en/malleable-systems-wiki--23fm1.md.html
[4d] https://lists.pharo.org/empathy/thread/RJ6RKBL2UCG6RURFEW7L3YLZLYMPYAVR?hash=4QVSF767ZW2ZIICNU4KIJR4DOW6IZEUA#4QVSF767ZW2ZIICNU4KIJR4DOW6IZEUA



Leo is not nearly as capable in supporting writing and documentation. That's because even though structure is important, it's also important to be able to see and edit the flow of the document from node to node and how nearby parts work together. An example is creating documentation using the rst3 command and Sphinx.  The mechanics of this process are excellent but doing editing beyond the node level requires a lot of mental effort and trial runs with Sphinx.  The Viewrendered3 plugin is designed to help with this problem by letting the user view an entire subtree.

Jupyter notebooks are serial combinations of documentation and code. Basically they are a limited form of "literate programming".  The code cells are usually small enough that they don't need to be structured.  They are displayed by a Jupyter-viewing program in a clean, very readable way.

The Jupytext work over the last few weeks presents the user with a direct view of the Jupytext-formatted file, sentinels, embedded comments, and all.  A user has to hand-edit the file while being distracted by the extra markup and not being able to see the flow of the parts one into another.  Yes, parts can be moved around and navigated to using the outline, but the editing experience is inferior.  There is also no syntax highlighting for code nodes - since the code is all commented out - nor can even rudimentary syntax checks be carried out.  The appearance of the document as it would be viewed in a Jupyter program - well, it is unknown until the file is saved to the .ipynb form and reloaded into Jupyter.

[...]

The advancements in editing and viewing Jupytext inside Leo are pretty interesting, despite not using Python/Leo actively myself, after me departing to the Pharo/Smalltalks realm. I have seen some screenshots in this thread and it is closer to a dreamed interactive programmable outliner I had years ago. I wonder if a video could transmit better the interactive usage experience for the Leo + Jupyter combination, particularly regarding the notebook modification, writing, calculation and plotting inside it. While Leo was not thought for interactive computing/writing/live-coding, I think that its ideas have a lot to offer there.

Kudos to the community for this exploration.

Offray

Edward K. Ream

unread,
Nov 7, 2024, 6:01:14 AM11/7/24
to leo-e...@googlegroups.com
On Wed, Nov 6, 2024 at 12:01 PM Offray Vladimir Luna Cárdenas <off...@riseup.net> wrote:

> I have enjoyed this thread pretty much, as I was one of the first advocates of a combination of the features of Jupyter/IPython regarding interactivity and Leo regarding a self-referential document tree programmable inside itself.

> I created my own outliner, Grafoscopio, in Pharo Smalltalk as I have [said] before. 

Thanks for the update. @jupytext should open the door to better cooperation between tools.

Edward

Reply all
Reply to author
Forward
0 new messages