ENB: Enhancing Leo's colorizer

35 views
Skip to first unread message

Edward K. Ream

unread,
Nov 2, 2024, 10:34:18 AM11/2/24
to leo-editor

If I were writing Leo's colorizer from scratch, I would scrap the way it works. This Engineering Notebook post:


- Discusses ideas that arose from my recent work with @language jupytext.

- Shows how to use these ideas to extend Leo's colorizer safely.


Terminology


Remember this crucial distinction:


- A mode file is a file in the leo/modes directory.

- Leo's colorizer is an instance of the JEditColorizer class in leo/core/colorizer.py.


Tokenizing mode files


When importing a mode file, say x.py, Leo's colorizer will first look for a file named token_x.py. Let's call such files tokenizing mode files. Such files must have top-level init_text and colorize_line functions. Tokenize mode files may also have other top-level functions. See below.


When colorizing a tokenizing mode file, the colorizer will call init_text(p.b) whenever p changes.


To colorize each line, the colorizer will use this new main loop:


def tokenizingMainLoop(self, n: int, s: str) -> None:
    """Colorize a *single* line s, starting in state n."""
    t1 = time.process_time()
    colorizer = self
    self.colorizer_module.colorize_line(colorizer, n, s)
    self.tot_time += time.process_time() - t1


This new main loop should be much faster than the existing main loop! It colorizes by lines, not characters!


Design of tokenizing mode files


Tokenizing mode files will keep track of their own state.


Most mode files will be line-oriented tokenizers. token_python.py might even use Python's tokenizer module! The speedup should be significant.


The top-level colorize_line(colorizer, s, i) function will call colorizer.colorRangeWithTag to do the actual coloring. Tokenizing mode files will know nothing else about the colorizer! No more calls to the colorizer's weird pattern matchers! No more interface dictionaries!


Delegation between tokenizing mode files suddenly becomes straightforward! Tokenizing mode files will call the top-level init_delegated_state function of another tokenizing mode file.


Safety


Experimenting with this scheme carries minimal risks:


- No existing mode files will change!

  I'll only add new tokenizing mode files.

- Only JEditColorizer.init_mode will change.

  It will contain new code that loads tokenizing mode files if they exist.

  All other parts of this method will remain unchanged.


Summary


This scheme is worth investigating:


- Experiments carry no significant risk.

- Colorizing should be much faster.

- Tokenizing mode files will know almost nothing about Leo's colorizer.

- Delegation between tokenizing mode files should collapse in complexity.


Edward


P.S. Tokenizing mode files might call helper modes defined in leo/modes/_helper_modes.py.


Helper modes might handle Leo keywords and decorators. Delegating colorizing to helper modes would provide an early test of the new delegation scheme.


EKR

Edward K. Ream

unread,
Nov 2, 2024, 2:20:24 PM11/2/24
to leo-e...@googlegroups.com
On Sat, Nov 2, 2024 at 9:34 AM Edward K. Ream <edre...@gmail.com> wrote:

> This Engineering Notebook post...shows how to use these ideas to extend Leo's colorizer safely.

First, I'll create a new mode file for @jupytext. This file will be a good test bed because it will depend heavily on delegation.

> When colorizing a tokenizing mode file, the colorizer will call init_text(p.b) whenever p changes.

init_text should take different arguments:

    def  init_text(colorizer, c, p):

> P.S. Tokenizing mode files might call helper modes defined in leo/modes/_helper_modes.py.

On second thought, this file is unnecessary. Instead, we can add new helpers to the JEditColorizer class. Some of these new methods will support delegation with either:

- an embedded main loop, as with @jupytext now.
- calls to a new-style file.

Summary

No more invention is needed. The details may change, but everything will happen naturally.

Edward

Edward K. Ream

unread,
Nov 3, 2024, 4:28:16 AM11/3/24
to leo-e...@googlegroups.com
On Saturday, November 2, 2024 at 9:34:18 AM UTC-5 Edward K. Ream wrote:


> To colorize each line, the colorizer will use this new main loop:


I forgot that the new main loop must update the QSyntaxHighlighter state! Like this:


def newMainLoop(self, n: int, s: str) -> None:

    """Colorize a *single* line s, starting in state n."""

    # colorize_line line "s".
    state = self.new_mode_module.colorize_line(s, n)

    # Set the state for QSyntaxHighlighter!
    n = self.computeState(f=None, state=state)
    self.setState(n)

Discussion

In the old world, updating the QSyntaxHighlighter state happens deep within the bowels of the pattern matchers, that is, within the so-called "restart" methods. The old main loop must not update the state explicitly!

In contrast, the new main loop must update the QSyntaxHighlighter state. The new scheme should be significantly easier to understand and debug.

Edward

Edward K. Ream

unread,
Nov 3, 2024, 4:38:15 AM11/3/24
to leo-editor
On Sunday, November 3, 2024 at 3:28:16 AM UTC-6 Edward K. Ream wrote:

state = self.new_mode_module.colorize_line(s, n)

This pattern suggests that each new colorizer's colorize_line function will be a state machine, but that's not an actual requirement.

Furthermore, colorize_line will likely ignore the integer "n" arg, maintaining its own internal state instead.

Edward
Reply all
Reply to author
Forward
0 new messages