ENB: Enhance @jupytext with Thomas's script

30 views
Skip to first unread message

Edward K. Ream

unread,
Oct 29, 2024, 6:39:20 AM10/29/24
to leo-editor

The fruitful collaboration with Thomas continues. This Engineering Notebook post discusses adapting Thomas's prototype script to enhance how Leo handles @jupytext nodes. This enhancement:


- will be rock solid.

- can cover edge cases with ease.

- will be invisible when viewed in a Jupyter notebook.

- should be complete in a day or three.


Background


The jupytext library can translate an .ipynb file to pseudo-Python text. Thomas's script:


- starts with an @jupytext node whose body text (root.b) containing all the pseudo-python code.

- splits root.b into chunks.

- Adds a child node for each chunk.

- Clears (most of) root.b and adds an @others directive.


The details of this script don't much matter. It's the idea that counts.


Overview of enhanced @jupytext processing


Leo currently reads .ipynb files into the body text of @jupytext nodes. So Thomas's script starts where Leo's @jupytext read code ends. Inspired by the script, the read code will then create child nodes as described below in detail.


Safety


The enhanced @jupytext processing will work like one of Leo's importers.


Vitalije's great insight is that importers will round-trip correctly, provided the nodes they produce tile the incoming text without gaps. Let's call this the tiling property. The main line of Thomas's script meets this requirement and so will the enhancement.


Splitting text into nodes


We can imagine various ways of creating child nodes. Let's not worry about the details just now. All that matters is that the tiling property holds.


I'll pick one way that seems most Leonine. If there are differences of opinion, we might add a new Leo setting.


Creating outline structure


Leo could make all nodes as direct children of the root @jupytext node. Users could easily reorder the nodes as they please. However, it should be worthwhile to create nodes that follow the implied hierarchy of markdown sections.


Some nodes of the hierarchy may be missing, but that's not a problem. The importer will create organizer nodes for each missing level. This scheme "just works" because organizer nodes are invisible to the .ipynb file.


Proof: @jupyter works like @clean, so Leo never writes headlines to the .ipynb file. Organizer nodes contain no text, so again they contribute nothing to the .ipynb file.


It's easy to create organizer nodes. A stack contains the positions of the last seen node at each level. Initially, the stack contains the root @jupytext level. When adding a new node, the importer will:


- Cut the stack back if the new level is less than the old.

- Replace the top of the stack if the new level remains unchanged.

- Create organizer nodes as needed if the new level is greater than the old.


I've written this kind of code many times. Indeed, the base Importer class defines the i.create_placeholder method. The jupytext importer might use that method! And perhaps others.


Summary


Thomas's script shows how easy it is to split @jupytext text into nodes. Thank you Thomas!


The enhanced code constitutes a new importer. This importer:


- will be part of Leo's @jupytext support. There is no need for a separate command.

- will be rock solid because it will preserve the tiling property.

- will honor the implied hierarchy of markdown nodes, creating organizer nodes as needed.

- should be complete in a day or three. This is not my first rodeo.


Finally, the newly created outline structure will be invisible with Jupyter notebooks. Let the wild rumpus start!


All of your questions and comments are welcome.


Edward

Edward K. Ream

unread,
Oct 29, 2024, 6:47:26 AM10/29/24
to leo-editor
On Tuesday, October 29, 2024 at 5:39:20 AM UTC-5 Edward K. Ream wrote:

The enhanced code constitutes a new importer. This importer...should be complete in a day or three.


I shall not delay Leo 6.8.2 for any reason. Leo 6.8.2 will go out the door on Friday, November 8.

I'll push the new importer to Leo 6.8.3 if I don't finish the PR by this Friday, November 1.

Edward

Thomas Passin

unread,
Oct 29, 2024, 9:05:48 AM10/29/24
to leo-editor
On Tuesday, October 29, 2024 at 6:39:20 AM UTC-4 Edward K. Ream wrote:

The fruitful collaboration with Thomas continues. This Engineering Notebook post discusses adapting Thomas's prototype script to enhance how Leo handles @jupytext nodes. This enhancement:


- will be rock solid.

- can cover edge cases with ease.

- will be invisible when viewed in a Jupyter notebook.

- should be complete in a day or three.


[snip]

I've written this kind of code many times. Indeed, the base Importer class defines the i.create_placeholder method. The jupytext importer might use that method! And perhaps others.


This is exactly why I said it would done more quickly if you do it.  I can see what to do but I've never worked out the details before.  You have them at your fingertips, especially matters of importers and building Leo node trees.

Edward K. Ream

unread,
Oct 29, 2024, 9:06:07 AM10/29/24
to leo-editor
On Tuesday, October 29, 2024 at 5:39:20 AM UTC-5 Edward K. Ream wrote:

This Engineering Notebook post discusses adapting Thomas's prototype script to enhance how Leo handles @jupytext nodes.


I have just created PR #4138. The first comment of this PR discusses (at length!) how to connect the new importer to Leo's existing @jupytext code. Unless I am mistaken, the connection already works correctly!


The only remaining task is to complete jtm.create_outline. This method will be a straightforward adaptation of Thomas's script, as discussed earlier in this thread.


Edward

Edward K. Ream

unread,
Oct 29, 2024, 9:09:36 AM10/29/24
to leo-e...@googlegroups.com
On Tue, Oct 29, 2024 at 8:05 AM Thomas Passin <tbp1...@gmail.com> wrote:

>> I've written this kind of code many times.
> This is exactly why I said it would be done more quickly if you do it.

Thomas, I am thrilled with our collaboration. I'm happy to enter another rodeo. Onward!

Edward

Thomas Passin

unread,
Oct 29, 2024, 9:11:00 AM10/29/24
to leo-editor
I would like to make one more push for eliminating the '#' characters that comment out each and every line of a juyptext line. They can be put back when the file is saved.  They do nothing for a Leo user except ad visual clutter, and the possibility for error if someone accidentally removes or adds them - easy enough to do when some "#" mean the heading level and some are the line's comment.

An alternative is to use jypytext's md format instead.  It doesn't use comments, has a different md section marker, and denotes code blocks by the standard markdown triple fences (```), if I understand it right.

Thomas Passin

unread,
Oct 29, 2024, 9:40:09 AM10/29/24
to leo-editor
On Tuesday, October 29, 2024 at 6:39:20 AM UTC-4 Edward K. Ream wrote:

The fruitful collaboration with Thomas continues. This Engineering Notebook post discusses adapting Thomas's prototype script to enhance how Leo handles @jupytext nodes.


Some nodes of the hierarchy may be missing, but that's not a problem. The importer will create organizer nodes for each missing level. This scheme "just works" because organizer nodes are invisible to the .ipynb file.


I see this part a little differently. I don't think there is any need for dummy organizer nodes. And what headline would they be given, anyway?

Let's start with an itemized list that we will convert to a Leo tree of nodes:

1.0 Intro
    1.1 Context
    1.2 Goals
2.0 Approach
    2.1 Requirements
        2.1.1 User requirements
        2.1.2 Maintenance requirements
            2.1.2.1 Scheduled
            2.1.2.2 Unplanned
3.0 Schedule

Note that there is are indentation jumps, e.g., from  level 4 indentation (2.1.2.2) to level 0 indentation( 3.0). The natural translation into nodes is (denoting a node with the "-" character"):

- 1.0 Intro
    - 1.1 Context
    - 1.2 Goals
- 2.0 Approach
    - 2.1 Requirements
        - 2.1.1 User requirements
        - 2.1.2 Maintenance requirements
            - 2.1.2.1 Scheduled
            - 2.1.2.2 Unplanned
- 3.0 Schedule

See? We don't need any extra organizer nodes, even where the indentation jumps back several levels. Each cell of the jupytext markdown cells either starts with a heading of some level or it doesn't. My script uses those headings to create the headline for the cell. If it doesn't, just the plain text of the first line should be used (I forget whether my script does this last bit or not). (I only take the first 6 words of the header line - you wouldn't want the whole line).

Edward K. Ream

unread,
Oct 29, 2024, 9:57:28 AM10/29/24
to leo-e...@googlegroups.com
On Tue, Oct 29, 2024 at 8:40 AM Thomas Passin <tbp1...@gmail.com> wrote:

> See? We don't need any extra organizer nodes, even where the indentation jumps back several levels.

I wasn't clear. That's the easy case.

The only time we might want organizer nodes is when indentation increases several levels.

Edward

Edward K. Ream

unread,
Oct 29, 2024, 10:02:39 AM10/29/24
to leo-e...@googlegroups.com
On Tue, Oct 29, 2024 at 8:11 AM Thomas Passin <tbp1...@gmail.com> wrote:

I would like to make one more push for eliminating the '#' characters that comment out each and every line of a juyptext line.

Here is snippet from my test file:

# %%
2 + 666 + 4
# %%
print('hi changed externally')
# %% [markdown]
# This is a markdown cell


The two python lines do not start with '#'. If we remove comment lines it's likely to be challenging to put them back.

I'll leave this as an open question until I get most of the PR working.

Edward

Thomas Passin

unread,
Oct 29, 2024, 11:39:02 AM10/29/24
to leo-editor
On Tuesday, October 29, 2024 at 10:02:39 AM UTC-4 Edward K. Ream wrote:
On Tue, Oct 29, 2024 at 8:11 AM Thomas Passin <tbp1...@gmail.com> wrote:

I would like to make one more push for eliminating the '#' characters that comment out each and every line of a juyptext line.

Here is snippet from my test file:

# %%
2 + 666 + 4
# %%
print('hi changed externally')
# %% [markdown]
# This is a markdown cell


The two python lines do not start with '#'. If we remove comment lines it's likely to be challenging to put them back.

I don't see it as any kind of a problem. Markdown cells get the comment character, code cells don't. The code and md markers have to be inserted either way.  We can know the cell type by leaving in the cell marker.  Or, as I would prefer, put "@language md" at the top of md cells and "@language python" at the top of code cells. When saving the outline,  do a string substitution "@language md" -> '%% [markdown]", etc..  Then insert "# " at the front of every line.

The code seems to me to be so simple, and the omission of comment characters to bring such benefits to users, that it would be very worthwhile.

Just imagine that you are a Jupyter Notebook user.  You probably don't even know about Jupyter text and its format.  You may have been using VS Code's plugin for Jupyter.  Someone has convinced you to try using Leo to work with notebook files. They tell you to open Leo, go to the bottom of the workbook, and "Import Any File" on one of your notebooks.  You see this tree appear.  You think it looks reasonable at first glance. You select one of the nodes.

Wait, what the heck are these "#" marks doing here"  I don't want them! Can I delete them?  Your friend says no, they need to stay in.  Huh?  But I want to add a level 2 headline with its "##" prefix, what do I do?  "Just do it but remember to add "# " to the front".  OK, but here's this code cell.  Do I have to add a comment to every new line I create?  And there's this weird comment at the top of the cell?  "No, that has to stay".  Hmm, how do I execute code like I can do in VS Code?  Where are the graphics? "Just regenerate them back in VS Code after you save the notebook in Leo."  You know what, I think I'll just stick with VS Code, thank you very much".

Edward K. Ream

unread,
Oct 29, 2024, 12:10:32 PM10/29/24
to leo-e...@googlegroups.com
On Tue, Oct 29, 2024 at 10:39 AM Thomas Passin <tbp1...@gmail.com> wrote:

> I don't see it as any kind of a problem.

Noted. I'll do this only if it is dead easy.

Anyway, I'm preoccupied with matters just now.

Edward
Reply all
Reply to author
Forward
0 new messages