Problem: Syntax highlighting spends time matching patterns on lines where
they cannot possibly match, which is noticeable in large files
with many syntax items.
Solution: Before running a pattern's regexp, skip it when the bytes it
requires are absent from the line (a per-pattern lead-byte
prefilter derived from the pattern at definition time). The
resulting highlighting is unchanged.
https://github.com/vim/vim/pull/20371
(2 files)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Build: ./configure default CFLAGS (-g -O2), no profiling.
Method: force full-buffer syntax highlighting by calling synID() on every
line/column, measure wall time with reltime(), median of several runs.
Baseline = the same tree without the prefilter.
Full-buffer highlight:
| file (filetype) | lines | baseline | with prefilter | change |
|---|---|---|---|---|
| big.c, C (concatenated) | 99,192 | ~6.70 s | ~3.00 s | ~55% faster (~2.2x) |
| src/evalfunc.c, C | 12,919 | ~0.92 s | ~0.44 s | ~52% faster (~2.1x) |
| netrw.vim, Vim script | 9,717 | ~5.8 s | ~5.2 s | ~11% faster |
Mechanism: on a typical C buffer about 40% of regexp executions are patterns
that never match on that line (for example character/string-constant and
preprocessor patterns that are tried on every line). The prefilter removes
most of them; the share of regexp time spent on never-matching patterns drops
from ~40% to ~15% (measured with :syntime).
The gain depends on the filetype. For regexp-pattern-heavy syntaxes (C/C++)
with many never-matching patterns it is large. For keyword-heavy syntaxes
(Vim script), where much of the work is keyword lookup rather than regexp
matching, the overall gain is smaller, but regexp executions are still cut
substantially with identical highlighting (netrw.vim: 1,680,030 -> 1,024,966
regexp calls, about 39% fewer).
Correctness: byte-for-byte identical synID() output vs. the baseline across
C, C++, Vim script, Python, Ruby, Lua, JavaScript, shell, HTML and CSS
(millions of cells), including multibyte content, very long lines and a
reduced 'synmaxcol'.
Tests: test_syntax (including a new Test_syntax_lead_byte_prefilter regression
test), test_highlight, test_spell and test_textprop all pass.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I want to flag the main maintenance concern with this change myself.
Concern. The lead-byte prefilter derives its first-byte / required-byte sets
by re-implementing a subset of Vim's regexp grammar in syn_compute_first_bytes()
and its helpers: magic-mode parsing, character classes, anchors, quantifiers,
groups and alternation. That means:
a\|b), look-around (\@<= / \@!), ignore-case (\c and :syn case\c \v ...).\<char> escape — especially a zero-widthMitigation I plan to add (commit on this branch). A differential test that
makes the optimization self-checking:
test_override('syn_prefilter', 1) to disable the prefilter at runtimenfa_fail etc.).synID() outputWith that test in place, any future edit — or any new regexp construct the
analyzer fails to model — surfaces as a CI failure instead of silently wrong
highlighting. The optimization stays safe to maintain regardless of the
analyzer's complexity. The analyzer is already written to bail out (keep
trying the pattern, behaviour unchanged) on anything it does not model, so the
intended degradation is "no speed-up", never "wrong result"; the differential
test is what guarantees that property keeps holding.
Alternatives considered (not in this PR).
\<char> escape bails instead—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
A differential test that makes the optimization self-checking:
Done.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The "syntax highlighting performance trilogy" is now complete. This PR (#20371) is the first part; two more are lined up in my fork, to be submitted upstream after this one lands:
in_id_list() cache. Deciding whether a group is in a contains/cluster list scans the list and expands clusters on every check. Resolve each list once into a sorted, cluster-expanded set of group IDs and use a binary search, cached per syntax block and dropped when syntax definitions change.All three are pure speedups — the resulting highlighting is unchanged, and each one ships with a test_override() flag plus a test that asserts identical synID() output with the optimization on and off.
Measured on a single binary that contains all three, toggling each optimization via its test_override() flag, so "all off" reproduces the current (master) algorithm. The workload is a full-buffer synID() sweep (every line, up to end-of-line); median of 5 runs.
C source (src/eval.c, 8209 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 0.570s | 1.00× |
| + lead-byte prefilter (#20371) | 0.253s | 2.25× |
| + in_id_list cache (#29) | 0.551s | 1.03× |
| + saved-state hint (#30) | 0.526s | 1.08× |
| all three on | 0.200s | 2.86× |
Vim script (netrw.vim, 9717 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 7.92s | 1.00× |
| + lead-byte prefilter (#20371) | 7.16s | 1.11× |
| + in_id_list cache (#29) | 5.34s | 1.48× |
| + saved-state hint (#30) | 7.44s | 1.06× |
| all three on | 4.28s | 1.85× |
The three target different bottlenecks and are complementary: the prefilter dominates for C (many patterns with distinct lead bytes), the in_id_list() cache dominates for Vim script (large contains/cluster lists, e.g. netrw), and the saved-state hint gives a small but filetype-independent gain. Combined, that's roughly 2.9× on C-heavy code and 1.85× on Vim-script-heavy code, with highlighting output unchanged.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@chrisbra Do you have any concerns?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
no, this sounds like a very nice performance optimization. I'll have a closer look later, this is all a bit over my head right now :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Update: I've folded in the safe-by-default tightening I listed as a follow-up
in the self-review above. Unknown alphanumeric regexp escapes now bail (the
pattern is always tried) rather than being treated as ordinary atoms, so the
analyzer degrades to "no speed-up", never "wrong highlighting", even for regexp
constructs added in the future. Covered by the new
Test_syntax_prefilter_classes() (identical synID() with the prefilter on and
off).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()