This Engineering Notebook post contains design and coding notes about PR #4147. As always, please feel free to ignore this ENB.
This ENB is unusually long and complex. It is essentially notes to myself. It is also pre-writing for:
- A new info item: Theory of Operation for Leo's colorizer.
- Several (long!) docstrings in various methods in the JEditColorizer (jedit) class. Unless indicated otherwise, all methods mentioned here are members of the jedit class.
These notes are essential--it's way too easy to become confused in a maze of contexts and hidden relationships. Such confusion caused a gigantic performance bug a decade ago. A similar confusion lies beneath the present minor performance bug.
I have spent several days tracing the code. As an essential first step, I removed all the hacks that partially ruined the legacy code. Now I know how the qsh works (and should work) when Leo isn't mucking things up. This ENB rests on that new foundation of understanding.
Requirements
The legacy code in the jedit class might seem overly complex. But it's actually a superb piece of engineering! Anyway, only the profoundly ignorant would suggest wholesale changes. Indeed, the jedit class is highly reliable, despite its performance bug.
The following requirements will ensure that the PR fixes the performance bug while retaining the colorizer's outstanding reliability.
1. The PR must not interfere with the qsh. The qsh is highly optimized. The previous attempts to "help" it were grossly misguided! The PR must remove all such hacks and interference.
2. The PR must not change mainLoop, the method that calls the pattern matchers.
3. The PR must not change any pattern matcher. As an exception, the pattern matchers that handle @language, @color, etc., may change how they handle the qsh state. Such semantic changes are likely to be straightforward.
4. The PR must not change any code outside leoColorizer.py. As the single exception, LeoTree.select may call c.recolor(p), where p is the new value of c.p.
The PR's remaining bug
The PR already colors most nodes correctly. However, it does not handle nodes containing multiple @language directives. The PR probably also has some issues with nodes containing @nocolor, @color and @killcolor.
I now know why the PR doesn't work perfectly. The qsh can (and will) call _recolor(s) where s is out-of-sequence with the preceding call. For example, the following snippet is typical in @jupytext nodes:
# ?? (like @language python)
def spam():
# %% [markdown] (like @language md)
# Section 1
It was the best of times; it was the worst of times.
The user can click (or cut or paste) anywhere in this text. The qsh coordinates with QTextEdit to detect changes. The qsh then calls _recolor(s) starting with the first changed line. There is no necessary relationship between s and the line that _recolor last saw!
The legacy code attempted to switch between languages using its language ivar. But this approach can't handle out-of-sequence calls to _recolor. It's as simple as that. Happily, tweaking the legacy code should squash this bug.
The legacy colorizer works correctly!
Overview of the colorizer
The calling sequence of the various components is as follows:
LeoTree.select changes c.p. It's a complex task. It must not change in any way except that it may call c.recolor right at the very end.
LeoTree.select calls c.recolor(p), where p is Leo's new position. In turn, c.recolor calls c.colorize, the only entry point for the jedit colorizer.
At present, c.colorize does some preliminary initialization. I'm not sure that's required. In any event, c.colorize ends as follows:
# Tell QSyntaxHighlighter to do a full recolor.
try:
self.in_full_redraw = True # Debugging only.
self.highlighter.rehighlight()
finally:
self.in_full_redraw = False
This code causes the qsh to call _redraw(s) whenever the user changes p.b. Remember that "s" is a single line of text. Forget this at your peril!
The big picture is this: _redraw and the qsh are expected to collaborate. In particular, _redraw must:
- Call qsh methods to colorize s, a single line of text.
- Inform the qsh of the qsh state of the end of the colorized line.
If _redraw sets the qsh state properly, the qsh will issue highly optimized calls to _redraw(s). In other words, the qsh will call _redraw(s) only if the coloring for s might change. This optimization is the reason why the qsh exists!
Summary of _redraw
To recap: _redraw(s) must colorize "s" (a single line of text) and inform the qsh of the qsh state that should be in effect afterward.
_redraw computes the initial qsh state and calls mainLine to colorize s. The mainLine method does not change state directly! It only calls pattern matches. The pattern matchers compute the final qsh state as a side effect.
Summary
Early commits to the PR removed all hacks and added traces based on new tracing tools.
Understanding, not guesswork, is the only way to improve Leo's colorizer. This ENB solidifies my insights. It's time to fix the bug!
This ENB is pre-writing for several (long!) docstrings:
# Read this docstring or suffer!
<< jedit.whatever: docstring >>
The crux of the PR lies in _recolor. This method must compute the initial qsh state so that the qsh calls _redraw as few times as possible.
The final code should look natural and uncomplicated. Oh how appearances can be deceiving! It's been a long road.
Edward
P.S. It has taken many years to complete Leo's colorizer. There is no shame in that. My recent studies depended on essential tools that didn't exist almost two decades ago when I first wrote Leo's colorizer:
- PRs: a permanent record of work.
- git bisect: risk-free programming.
- python's f-strings, g.app.debug and g.printObj.
These tools make sophisticated traces possible.
- cff: the all-important study tool.
Investigating the performance bug required all of these tools. Even so, the PR is the record of arduous work.
EKR
> I removed all the hacks that partially ruined the legacy code.
In other words, I am already eating my own dog food (in the PRs branch). It's easy to ignore the remaining coloring glitch. That's good news. There will be little risk in merging the PR into "devel."
Other notes
In retrospect, yesterday's study was possible at any time in the colorizer's history. Oh well, mistakes happen. I have some theories about why I never did the obvious study, but I'm happy to move on.
Aha! The colorizer must be, in a limited sense, stateless. True, _recolor uses and manipulates the state of the line being colorized. However, it is futile to maintain any data between calls to _recolor. I shall remove all such ivars.
Edward
P. S. Heh. The Aha has revealed an unimportant edge case.
After an @nocolor directive pauses coloring, what language should be in effect after a later @color directive? It could be either:
- the previously active color, or
- the default color that applies at the start of each node.
It will be amusing to see what the legacy code does, but I shall not make any changes.
EKR
Aha! The colorizer must be, in a limited sense, stateless. True, _recolor uses and manipulates the state of the line being colorized. However, it is futile to maintain any data between calls to _recolor. I shall remove all such ivars.
This ENB is ...pre-writing for... several (long!) docstrings.
In other words, I am already eating my own dog food (in the PRs branch).
It's easy to ignore the remaining coloring glitch.
I spoke too soon. Coloring markdown improperly quickly becomes super annoying.