Problem: Syntax highlighting spends time matching patterns on lines where
they cannot possibly match, which is noticeable in large files
with many syntax items.
Solution: Before running a pattern's regexp, skip it when the bytes it
requires are absent from the line (a per-pattern lead-byte
prefilter derived from the pattern at definition time). The
resulting highlighting is unchanged.
https://github.com/vim/vim/pull/20371
(2 files)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Build: ./configure default CFLAGS (-g -O2), no profiling.
Method: force full-buffer syntax highlighting by calling synID() on every
line/column, measure wall time with reltime(), median of several runs.
Baseline = the same tree without the prefilter.
Full-buffer highlight:
| file (filetype) | lines | baseline | with prefilter | change |
|---|---|---|---|---|
| big.c, C (concatenated) | 99,192 | ~6.70 s | ~3.00 s | ~55% faster (~2.2x) |
| src/evalfunc.c, C | 12,919 | ~0.92 s | ~0.44 s | ~52% faster (~2.1x) |
| netrw.vim, Vim script | 9,717 | ~5.8 s | ~5.2 s | ~11% faster |
Mechanism: on a typical C buffer about 40% of regexp executions are patterns
that never match on that line (for example character/string-constant and
preprocessor patterns that are tried on every line). The prefilter removes
most of them; the share of regexp time spent on never-matching patterns drops
from ~40% to ~15% (measured with :syntime).
The gain depends on the filetype. For regexp-pattern-heavy syntaxes (C/C++)
with many never-matching patterns it is large. For keyword-heavy syntaxes
(Vim script), where much of the work is keyword lookup rather than regexp
matching, the overall gain is smaller, but regexp executions are still cut
substantially with identical highlighting (netrw.vim: 1,680,030 -> 1,024,966
regexp calls, about 39% fewer).
Correctness: byte-for-byte identical synID() output vs. the baseline across
C, C++, Vim script, Python, Ruby, Lua, JavaScript, shell, HTML and CSS
(millions of cells), including multibyte content, very long lines and a
reduced 'synmaxcol'.
Tests: test_syntax (including a new Test_syntax_lead_byte_prefilter regression
test), test_highlight, test_spell and test_textprop all pass.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I want to flag the main maintenance concern with this change myself.
Concern. The lead-byte prefilter derives its first-byte / required-byte sets
by re-implementing a subset of Vim's regexp grammar in syn_compute_first_bytes()
and its helpers: magic-mode parsing, character classes, anchors, quantifiers,
groups and alternation. That means:
a\|b), look-around (\@<= / \@!), ignore-case (\c and :syn case\c \v ...).\<char> escape — especially a zero-widthMitigation I plan to add (commit on this branch). A differential test that
makes the optimization self-checking:
test_override('syn_prefilter', 1) to disable the prefilter at runtimenfa_fail etc.).synID() outputWith that test in place, any future edit — or any new regexp construct the
analyzer fails to model — surfaces as a CI failure instead of silently wrong
highlighting. The optimization stays safe to maintain regardless of the
analyzer's complexity. The analyzer is already written to bail out (keep
trying the pattern, behaviour unchanged) on anything it does not model, so the
intended degradation is "no speed-up", never "wrong result"; the differential
test is what guarantees that property keeps holding.
Alternatives considered (not in this PR).
\<char> escape bails instead—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
A differential test that makes the optimization self-checking:
Done.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The "syntax highlighting performance trilogy" is now complete. This PR (#20371) is the first part; two more are lined up in my fork, to be submitted upstream after this one lands:
in_id_list() cache. Deciding whether a group is in a contains/cluster list scans the list and expands clusters on every check. Resolve each list once into a sorted, cluster-expanded set of group IDs and use a binary search, cached per syntax block and dropped when syntax definitions change.All three are pure speedups — the resulting highlighting is unchanged, and each one ships with a test_override() flag plus a test that asserts identical synID() output with the optimization on and off.
Measured on a single binary that contains all three, toggling each optimization via its test_override() flag, so "all off" reproduces the current (master) algorithm. The workload is a full-buffer synID() sweep (every line, up to end-of-line); median of 5 runs.
C source (src/eval.c, 8209 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 0.570s | 1.00× |
| + lead-byte prefilter (#20371) | 0.253s | 2.25× |
| + in_id_list cache (#29) | 0.551s | 1.03× |
| + saved-state hint (#30) | 0.526s | 1.08× |
| all three on | 0.200s | 2.86× |
Vim script (netrw.vim, 9717 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 7.92s | 1.00× |
| + lead-byte prefilter (#20371) | 7.16s | 1.11× |
| + in_id_list cache (#29) | 5.34s | 1.48× |
| + saved-state hint (#30) | 7.44s | 1.06× |
| all three on | 4.28s | 1.85× |
The three target different bottlenecks and are complementary: the prefilter dominates for C (many patterns with distinct lead bytes), the in_id_list() cache dominates for Vim script (large contains/cluster lists, e.g. netrw), and the saved-state hint gives a small but filetype-independent gain. Combined, that's roughly 2.9× on C-heavy code and 1.85× on Vim-script-heavy code, with highlighting output unchanged.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@chrisbra Do you have any concerns?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
no, this sounds like a very nice performance optimization. I'll have a closer look later, this is all a bit over my head right now :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Update: I've folded in the safe-by-default tightening I listed as a follow-up
in the self-review above. Unknown alphanumeric regexp escapes now bail (the
pattern is always tried) rather than being treated as ordinary atoms, so the
analyzer degrades to "no speed-up", never "wrong highlighting", even for regexp
constructs added in the future. Covered by the new
Test_syntax_prefilter_classes() (identical synID() with the prefilter on and
off).
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I think the first/required byte analysis should be performed by the BT engine, it already provides regstart and regmust. The new string parser does more that the BT optimisations but implementing this in the regex engine itself should afford a lot of opportunities to avoid bailing out as often as is being done here.
It's just a first impression but the potential to introduce subtle, long lasting, bugs seems high. However, the performance improvements look very nice and I'm unaware of any kittens ever being harmed by bad syntax highlighting.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@dkearns Thanks for the review!
One clarification: this isn't a "new string parser" that reimplements or
belongs in the regex engine. It does no regex matching and isn't an alternative
to (or optimization of) the BT/NFA engines.
It's a cheap pre-screen in the syntax layer that avoids calling the engine
when a pattern provably can't match the line. At definition time it analyzes
the pattern string once, only to derive a sound necessary condition (the set of
bytes a match may start with, and the set of bytes that must occur in the line).
At runtime it does no regex work: it builds the line's byte set once per line,
then rejects a pattern with a couple of bitset ops before entering the engine.
If it passes, the pattern goes to the engine exactly as before.
So the goal isn't a faster regexec — it's skipping the call entirely. Syntax
highlighting runs (patterns × lines) attempts, and the line byte set is built
once and shared across all patterns, so this amortization only works at the
syntax layer. regstart/regmust are a different, weaker thing (a single
start char and one literal substring, consulted inside vim_regexec()); for
the common case of keyword alternations like \<\(int\|long\|char\)\> they
give no help, while the prefilter can still skip lines containing none of
i,l,c.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I understand, I'm suggesting the entire pre-screen pattern analysis is happening in the wrong place and an unnecessary duplication. It could be done during regex compilation and the results then exposed for use by the syntax layer. The patterns are all compiled at file load time anyway.
I missed it on my first pass but it looks you've said exactly the same thing yourself. :)
Longer term, derive the first-byte set from the compiled NFA start states
instead of re-parsing the pattern text. That removes the grammar-tracking
burden entirely (single source of truth = the regexp engine), at the cost of
coupling to engine internals and handling the BT fallback. This would be a
separate, larger change.
Why implement an intermediate solution?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Fair question, but my answer is the opposite of what that quote implies: I'm
not planning to do the engine-derived version, and this isn't a step toward it.
It wouldn't be "expose what the engine already has". regstart is a single
start char (NUL the moment there's an alternation) and regmust is a single
literal substring (BT only). The prefilter needs a set of possible first
bytes and a set of required bytes, so doing it in the engine means adding new
set-valued outputs to both the BT and the NFA compilers and keeping them in
agreement — a substantial change in execution-critical code shared by
search/substitute/match. That is a lot of work and risk for an optimization
whose whole job is to never change behaviour, and the "subtle, long-lasting
bugs" you rightly worried about would land squarely in the engines.
I don't think that cost is justified. This prefilter is deliberately shallow:
it lives entirely in the syntax layer, touches no engine code, bails on
anything it doesn't model (so the worst case is "no speed-up", never "wrong
result"), and is pinned by an on/off differential test requiring byte-for-byte
identical highlighting. Since your review I've made it bail more aggressively—
\{n,m} and all character classes are no longer modeled — which both fixes
the \{42,0} case you found and shrinks the grammar surface that could ever
drift. The guiding principle is now stated at the top of the analyzer:
correctness over coverage, widen-only, when in doubt bail.
So I'd retract that "longer term" note rather than act on it: single source of
truth would be tidier in the abstract, but it isn't worth touching two regex
engines for. A shallow, bail-happy, self-contained prefilter already gives the
measured speed-up, and that's a good place to stop. I've dropped the misleading
bullet from the description.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I've re-run the benchmarks. (First result: #20371 (comment))
The numbers for Part 1 have dropped compared to the initial benchmark, but this is because I changed the prefilter analyzer to be safe-by-default (bailing out to avoid regressions). Even so, it still maintains an overall 2x speed-up, so I believe this patch is well worth merging.
The "syntax highlighting performance trilogy" is now complete. This PR (#20371) is the first part; two more are lined up in my fork, to be submitted upstream after this one lands:
in_id_list() cache. Deciding whether a group is in a contains/cluster list scans the list and expands clusters on every check. Resolve each list once into a sorted, cluster-expanded set of group IDs and use a binary search, cached per syntax block and dropped when syntax definitions change.All three are pure speedups — the resulting highlighting is unchanged, and each one ships with a test_override() flag plus a test that asserts identical synID() output with the optimization on and off.
Measured on a single binary that contains all three, toggling each optimization via its test_override() flag, so "all off" reproduces the current (master) algorithm. The workload is a full-buffer synID() sweep (every line, up to end-of-line); median of 5 runs.
C source (src/eval.c, 8209 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 0.788s | 1.00× |
| + lead-byte prefilter (#20371) | 0.500s | 1.57× |
| + in_id_list cache (#29) | 0.753s | 1.05× |
| + saved-state hint (#30) | 0.609s | 1.29× |
| all three on | 0.358s | 2.20× |
Vim script (netrw.vim, 9717 lines)
| Configuration | Time | Speedup |
|---|---|---|
| baseline (all off ≈ master) | 9.16s | 1.00× |
| + lead-byte prefilter (#20371) | 8.21s | 1.12× |
| + in_id_list cache (#29) | 4.73s | 1.94× |
| + saved-state hint (#30) | 8.30s | 1.10× |
| all three on | 4.41s | 2.08× |
The three target different bottlenecks and are complementary: the prefilter dominates for C (many patterns with distinct lead bytes), the in_id_list() cache dominates for Vim script (large contains/cluster lists, e.g. netrw), and the saved-state hint gives a smaller, filetype-independent gain. Combined, that's roughly 2.2× on C-heavy code and 2.1× on Vim-script-heavy code, with highlighting output unchanged.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I was suggesting that it might be useful to expand upon the very basic existing optimisations as this would probably (not always easy to intuit) be generally useful as well, but let's put that aside.
It makes little sense to me that the pattern is being parsed by anything other than the regex engine. It introduces a whole class of problems that just don't need to exist. We've already identified two issues just by eyeballing the parser.
There's certainly no need to query both engines and the BT engine is probably easier to work with for current purposes. However, if we don't want to generate these results during compilation, and don't want to process the compiled pattern, then it would be better just to parse the output of the existing regdump(). It's certainly no uglier than the current approach, accurate, and we get to support all pattern constructs for free.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Thanks — I took the "let the regex engine do it" suggestion seriously and
measured it before deciding, because it would clearly be the cleaner design if
it carried its weight. The result changed my mind about what to extract, so
let me share the numbers.
Benchmark: a ~25,000-line C file (Vim's own change.c/buffer.c/eval.c/
syntax.c/regexp.c/ex_docmd.c/edit.c concatenated), highlighted with
filetype=c, full-buffer synID() cold pass, min of 3 runs. Speedup vs. the
prefilter disabled (test_override('syn_prefilter', 1)):
| signal used to reject a line | speedup |
|---|---|
first byte only (≈ regstart) |
~1.0x (no measurable gain) |
engine regmust → required bytes |
~1.03x |
| required-byte set (union of mandatory bytes) | ~1.43x |
| first-byte set + required bytes (the original PR) | ~1.42x |
Two takeaways:
The first-byte set contributes essentially nothing on this workload, and
it was also the riskiest part (alternation start-sets, \c, magic mode, the
under-approximation completeness you were rightly worried about). So I
removed it entirely.
The entire win comes from the required-byte set — the union of bytes that
must occur somewhere in any match. The engine's exposed analyses don't
reproduce it: regstart is a single first character, and regmust is a
single longest literal supplied only when the pattern is "potent enough" —
both measure ~1.0–1.03x here. To get the required-byte union from the
engine you'd have to add a new must-set traversal to regexp_bt.c (it isn't
an existing field, and it isn't in regdump() output either), i.e. new
engine code rather than reuse.
So the PR is now a required-byte prefilter only (~124 fewer lines): the
analyzer no longer tracks any start-byte set, just the conservative
required-byte set, bailing on anything it can't model soundly. Correctness is
covered by the on/off equivalence tests plus differential fuzzing over random
patterns (several thousand cases, no mismatches); and structurally, required-byte
rejection is a strict subset of the previous filter's rejections, so it can only
ever skip fewer lines, never wrongly skip one.
I'm happy to revisit an engine-side must-set traversal later if we want
correctness-by-construction, but given the data I'd rather not block this on new
engine code.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Something is getting lost in translation here...
To get the required-byte union from the engine you'd have to add a new must-set traversal to regexp_bt.c (it isn't an existing field, and it isn't in regdump() output either), i.e. new engine code rather than reuse.
I was being slightly facetious in suggesting that we parse the regdump() output. That would be silly, and I intended it more as a sign post, but it would still be better than a new pattern parser in syntax.c. I'm suggesting that this required character set be computed from an analysis of the compiled pattern. This poses no engine-side risks; it is read-only.
As I read it we have gone from 2.2x to 1.4x for the prefilter on the same big.c in the pursuit of correctness that the compiled pattern provides for free.
I can't justify the existence of the new pattern parser and wouldn't like to see it merged.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
One number first, since I think it's skewing the trade-off: the 2.2x is the
three optimizations combined, never the prefilter alone. The prefilter on its
own is ~1.57x (eval.c) / ~1.12x (netrw), and dropping the first-byte set took it
from ~1.42x to ~1.43x on the big file — so no prefilter performance was traded
for correctness. The required-byte set was always carrying essentially the whole
prefilter win.
On deriving the required set from the compiled pattern: I should separate two
things I've conflated with it before, because neither objection actually applies
to what you're proposing now.
The ~1.0-1.03x numbers I measured were regstart/regmust — the engine's
existing outputs. They don't carry the win because they're a single start
char and a single literal, not the required-byte union. That result only
says "reusing the existing fields doesn't work", not "an engine-derived
required set would be slow". A read-only walk that computes the union from the
regprog would compute the same signal the analyzer computes now, so it should
land at the same ~1.43x. So I'm not going to argue this on speed.
My earlier "too risky / new engine code" was about adding set-valued outputs to
the BT and NFA compilers — modifying execution-critical matching code. A
read-only walk of the compiled BT program isn't that, and you're right that it
carries no execution risk. I withdraw that framing for this proposal.
What's left is narrower but, I think, still the deciding factor: a regprog walk
couples this syntax-layer optimization to the engine's internal, undocumented
bytecode layout — the representation that is otherwise only surfaced for
debugging via regdump(). And it isn't free of the correctness burden either: a
required-byte walk still has to be written correctly per opcode (BRANCH =
intersect across alternatives, STAR/\? = contributes nothing, ANYOF = no single
required byte, the \{n,m} reversal cases, …), and a mistake there over-claims
required bytes and wrongly skips a line exactly as a source-level mistake would.
So it relocates the per-construct correctness work onto engine internals rather
than removing it.
Against that, the current analyzer is self-contained, depends on nothing private
to the engine, and fails safe: when it can't prove a construct sound it bails and
the pattern is always tried, so a gap costs a missed skip, never a wrong one. It
already bails on \{n,m}, character classes, and backslashes inside [...]; the
floor is "model less, bail more", which only ever costs speed.
I'm not opposed to a read-only regprog analysis as a later refinement if we
decide the bytecode coupling is acceptable. But given the corrected numbers and
the fail-safe behaviour, I'd rather land the self-contained version than gate it
on coupling to the compiled program.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I thought the multiple 2.2x numbers were coincidental as the initial measurements appeared to have been done before episodes 2 on and 3 had hit cinemas and unless I'm misreading it your second comment only lists prefiltering. It's surprising that the simplified modelling hasn't had any significant impact.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Alright, I'm fine with the current approach. Yes, it duplicates part of the regex grammar, but the bail-on-unknown pattern plus the equivalence validation of syn_prefilter on/off prevents us from a wrong highlighting. I think that is an acceptable trade off.
I have just one comment: please drop the useless l: prefix for the vim script variable names in the test.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
This effort is ambitious and commendable.
I wonder whether the reported speedup figures paint an
accurate picture. Since the proposed filter is not
dependent on regexp engines and it has to (1) parse each
pattern for its needs, (2) manage a set of patterns against
which it is capable of running own byte (ASCII only?) tests
on arbitrary input, and (3) defer further checks to a regexp
engine when a partial match can be made, does time spent on
the filter operations now contribute to :syntime figures
as, I presume, filtered out (i.e. not attempted) patterns
do? How about &redrawtime?
Um, the filter scope is way too off. There are over 750
distributed syntax files, about a dozen were tested against.
Can we keep this filter off unless expressly asked for? Why
go looking at every :syn-match and :syn-region pattern?
Let the authors of syntax rules pick and annotate expensive
definitions with a new, list-taking, command argument e.g.
optimizewith=foo,bar. Treat this argument as a hint; apply
requested optimisation when appropriate, otherwise ignore
them. (The new argument may be later supported with
:syn-keyword definitions if relevant optimisations for the
whole keyword set are discovered and implemented.)
Most importantly, as it has already been pointed out, the
filter ought to cooperate with regexp engines: let an engine
go on about its business unless it needs to backtrack over
previously accepted input, then query the engine for literal
parts and apply the filter prior to pending backtracking.
Every time the filter gives up and a regexp engine kicks in
is wasted work, let's keep it to a minimum.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I just ran some very quick tests and I do see exactly what I was suggesting in my previous comment. The first version of this PR gave produced approximately a 2.2x improvement and the current version only 1.4x.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@dkearns — you're right, and I owe you a correction on the speed story. My
framing earlier was off, and it was contradicting my own first numbers.
The facts, straight:
The first version of the prefilter alone was ~2.2x (big.c 6.70s → 3.00s
= 2.23x; eval.c 0.570s → 0.253s = 2.25x). So your reading — first version
~2.2x, current ~1.4x on big.c — is correct. My later claim that "the 2.2x was
the three optimizations combined, never the prefilter alone" was wrong: my
very first measurement post was the prefilter alone, and it was ~2.2x. Sorry
for the confusion that caused.
The drop did not come from removing the first-byte set — that step was
~1.42x → ~1.43x, i.e. nothing. It came from making the analyzer
safe-by-default. After the soundness bugs you and @chrisbra found (the
\{42,0} → \{0,42} reversal, and backslash escapes inside [...]), I
stopped modeling \{n,m}, character classes, and unknown escapes at all and
bail on them instead. That is the change that lowered the speedup: eval.c from
2.25x to ~1.57x, and the big.c you measured from ~2.2x to ~1.4x. Same change —
the exact ratio just depends on how many filterable patterns the file
exercises (eval.c stays higher than big.c).
So the honest statement is: yes, the prefilter dropped, and that cost was paid
for correctness — specifically to guarantee it can never skip a line a pattern
could match. It
wasn't a free simplification; it was the price of "never wrong highlighting".
Given that the failure mode is a silent loss of highlighting, I think that
trade is the right one, and with byte-for-byte identical output I still
consider it worth merging — but I should have said this plainly instead of
slicing the numbers.
Separately, I took your "do it on the compiled pattern" suggestion and actually
built and measured it. I'll post that as its own comment so this one stays just
about the correction.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
@zzzyxwvut — thanks, these are fair questions. Let me take the measurement and
the scope points here; the "cooperate with the regexp engine" point I'll answer
in a separate comment, because I actually built that version and want to show the
numbers rather than argue it in the abstract.
You're right to suspect :syntime here, and the answer is that it does not
see the filter at all — in either direction:
:syntime times only a pattern's syn_regexec() call (that's where theIF_SYN_TIME wrapper sits). The filter's own work — building the line's bytesyn_start_line(), and the required-byte test insyn_line_cannot_match() — runs outside syn_regexec(), so it never shows:syntime total.:syntime (as you noted). That makes :syntime the wrong tool for thisSo the filter's cost has to be measured as wall-clock, with the filter fully on,
against a baseline that has it fully off. That is exactly how I benchmarked it:
the baseline is test_override('syn_prefilter', 1), which disables the rejection
entirely, so the filter's analysis, the per-line byte-set construction and every
required-byte test are all counted on the "on" side. The speedup is therefore
net of the filter's own overhead, not gross. (Full harness and a 3-way table are
in the separate comment.)
&redrawtime is a timeout, not a cost: the highlighting result is unchanged, so
the timeout behaviour is unchanged — the filter only makes the work finish
sooner, it never makes a redraw take longer.
I'd rather keep it applied to every pattern and not add an optimizewith= hint,
and the reason is that the safety doesn't depend on which files were tested:
synID() output with the filter on and off, but the guarantee for the othersp_filter_active = false, and the runtimesyn_line_cannot_match() is everAn optimizewith= annotation would invert that: it would put work on the author
of every one of those 750+ syntax files, ask them to judge which definitions are
expensive, and silently lose the speedup wherever the annotation is missing or
wrong. So I think automatic-and-safe is the better default here, and I'd prefer
not to add a knob for it.
(The engine-cooperation point follows in the next comment, with the prototype and
measurements.)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I took the "derive the required-byte set from the compiled pattern" suggestion
and actually built it, so the comparison is measured rather than predicted. I
wrote a read-only walk of the compiled backtracking-engine program (the BT
engine, which is the easier one to walk, as you noted) and put it behind an env
switch so it can be A/B'd against the current source-level analyzer at syntax
definition time. Full diff:
h-east/vim@syntax-leadbyte-prefilter...syntax-prefilter-regprog-experiment
— the key code, method and numbers are inlined below.
Result. The program walk is sound (byte-for-byte identical highlighting on
the on/off differential test, plus test_syntax and test_highlight), but on a
25,000-line C file it reaches ~1.33x vs the source analyzer's ~1.47x with
the prefilter disabled as the baseline. So deriving the set from the compiled
program is feasible and correct here, but it does not gain speed — it comes out
a bit slower — and it adds a dependency on the BT bytecode layout. My earlier
"should land at the same ~1.43x" was therefore slightly optimistic.
Two soundness bugs surfaced while writing the walk, both of the "silently
over-claim a required byte and drop a real match" kind: handing the \%#=N
engine-selection prefix to the BT engine raised E1281, and inline \c was
initially ignored so \cWORD required the bytes WORD and skipped a line
containing word. They were straightforward to fix (strip the prefix; read
RF_ICASE/RF_NOICASE from the compiled program), but they make the point
concretely: moving onto the compiled program relocates the per-construct
correctness work, it does not remove it.
| mode | median (s) | speedup |
|---|---|---|
prefilter disabled (test_override('syn_prefilter', 1)) |
| 2.223 | 1.00x | |
| source-level analyzer (this PR) | 1.507 | 1.47x |
| read-only walk of the compiled BT program |
| 1.671 | 1.33x |
Added to regexp_bt.c, next to the bytecode it reads. It compiles the pattern
with the BT engine, then walks the program: concatenation unions the required
bytes, alternation intersects across branches, character classes / ANY
contribute no specific byte, * / \? contribute nothing, \+ keeps its atom,
and anything not obviously sound bails (so the pattern is always tried). The
ignore-case decision is taken from the compiled program's flags, which is
exactly the kind of thing the engine has already resolved for us.
// "req" is a 32-byte (256-bit) set. Bail => caller always tries the pattern. static void req_chain(char_u *scan, char_u *stop, char_u *req, int ic, int depth, int *bail) { while (scan != NULL && scan != stop && !*bail) { int op = OP(scan); char_u *next = regnext(scan); // One character matched, but no specific byte is required. if ((op >= ANY && op <= NUPPER) || (op >= ANY + ADD_NL && op <= NUPPER + ADD_NL)) { scan = next; continue; } switch (op) { case END: return; // zero-width / pass-through (BOL EOL BOW EOW BOF EOF NOTHING BACK, // and the MOPEN/MCLOSE markers for \( \) \zs \ze): no required byte case BOL: case EOL: case BOW: case EOW: case RE_BOF: case RE_EOF: case NOTHING: case BACK: case MOPEN + 0: /* ... MOPEN + 9 */ case MCLOSE + 0: /* ... MCLOSE + 9 */ break; case EXACTLY: // literal run: every byte is mandatory req_add_exactly(OPERAND(scan), ic, req, bail); if (*bail) return; break; case STAR: case BRACE_SIMPLE: break; // 0 or more: operand optional case PLUS: // 1 or more: operand matched at least once if (OP(OPERAND(scan)) == EXACTLY) { req_add_exactly(OPERAND(OPERAND(scan)), ic, req, bail); if (*bail) return; } break; case BRANCH: // alternation: intersect across branches { char_u acc[REQ_BSET_SIZE], tmp[REQ_BSET_SIZE]; char_u *lastb = scan, *join, *b; int first = TRUE; if (depth >= 12) { *bail = TRUE; return; } for (b = scan; b != NULL && OP(b) == BRANCH; b = regnext(b)) lastb = b; join = regnext(lastb); for (b = scan; b != NULL && OP(b) == BRANCH; b = regnext(b)) { vim_memset(tmp, 0, REQ_BSET_SIZE); req_chain(OPERAND(b), join, tmp, ic, depth + 1, bail); if (*bail) return; if (first) { mch_memmove(acc, tmp, REQ_BSET_SIZE); first = FALSE; } else req_bset_and(acc, tmp); } if (!first) req_bset_or(req, acc); scan = join; continue; } default: // BRACE_LIMITS/COMPLEX, BEHIND, MATCH, *bail = TRUE; // NOMATCH, SUBPAT, BACKREF, NEWL, return; // MULTIBYTECODE, RE_COMPOSING, ... : bail } scan = next; } } int bt_prog_required_bytes(char_u *pattern, int re_flags, int ic, char_u *bset) { bt_regprog_T *prog; int bail = FALSE; char_u *pat = pattern; vim_memset(bset, 0, REQ_BSET_SIZE); // Skip an engine-selection prefix "\%#=N"; the BT engine cannot parse it. if (pat[0] == '\\' && pat[1] == '%' && pat[2] == '#' && pat[3] == '=' && pat[4] != NUL) pat += 5; rex.reg_buf = curbuf; ++emsg_off; // a pattern the BT engine rejects just bails prog = (bt_regprog_T *)bt_regengine.regcomp(pat, re_flags); --emsg_off; if (prog == NULL) return FAIL; // Inline "\c"/"\C" override the syntax-level case setting. if (prog->regflags & RF_ICASE) ic = TRUE; else if (prog->regflags & RF_NOICASE) ic = FALSE; req_chain(prog->program + 1, NULL, bset, ic, 0, &bail); vim_regfree((regprog_T *)prog); return bail ? FAIL : OK; }
syntax.c chooses between the two analyzers at definition time via an env var,
so the same binary measures both:
static int use_prog = -1; if (use_prog < 0) use_prog = mch_getenv((char_u *)"SYN_REQ_PROG") != NULL; if (use_prog) // fill ci->sp_req via bt_prog_required_bytes() instead of the source walk
Workload: a 25,000-line C file (Vim's own
eval.c+syntax.c+regexp.c+buffer.c+ex_docmd.c+edit.c+change.c
concatenated), filetype=c, full-buffer synID() sweep over every line and
column; 10 runs, median reported.
One detail worth flagging for anyone reproducing this: the sweep loop must run
inside a Vim9 :def function. A legacy-script loop is interpreted and its
per-iteration overhead dominates the wall time, which dilutes the ratio to about
1.13x and hides the real effect. Compiling the loop (Vim9 :def) removes that
overhead and the syntax work becomes the dominant cost — that is what makes the
source analyzer reproduce its expected ~1.43–1.47x.
vim9script syntax on execute 'edit ' .. getenv('BENCH_FILE') set ft=c syntax sync fromstart if getenv('BENCH_MODE') == 'off' test_override('syn_prefilter', 1) # disable the prefilter => baseline endif synID(1, 1, 1) def Sweep(): float var last = line('$') var start = reltime() for lnum in range(1, last) var lc = col([lnum, '$']) var c = 1 while c < lc synID(lnum, c, 1) c += 1 endwhile endfor return reltimefloat(reltime(start)) enddef writefile([printf('%s %.4f', getenv('BENCH_MODE'), Sweep())], getenv('BENCH_OUT'), 'a') qa!
Run with the prefilter disabled (off), with the source analyzer (no env), and
with the program walk (SYN_REQ_PROG=1).
With SYN_REQ_PROG=1, test_syntax (34) and test_highlight (51) pass, and the
existing on/off differential test (Test_syntax_prefilter_unchanged) is
byte-for-byte identical, so the program-derived set is sound on the tested
filetypes.
The ~1.33x vs ~1.47x gap is because this prototype bails more than the source
analyzer: it gives up on BRACE_*, MATCH/NOMATCH/SUBPAT, MULTIBYTECODE
and a few others, where the source-level parser still extracts some required
bytes. Closing the gap means modeling more opcodes — i.e. adding back the
per-construct work that doing it on the compiled program was meant to avoid. So
the compiled-program route trades one grammar to track (the pattern syntax) for
another (the bytecode), without a speed win to show for it.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I think it would be best to pursue maximum performance, at least in the longer term. I think the 2.2x to 1.4x hit is far too much of a compromise. I never quite understood the motivation for dropping things like \{ other than it being a simplification. The whole endeavour has risks like any optimisation work and I think these bugs should just be fixed.
Thanks for building the prototype.
I'm not sure I understand the performance comparison. Performance (as measured by :syntime at least) is entirely dependent on the quality and not the source of the analysis as you later point out under "Correctness and the gap".
The analysis has to be done somewhere and I still don't see any downside to performing it on the compiled program. For example, instead of having to parse the \{ quantifier grammar, and being surprised by the sneaky max-min reversal, the normalised min/max values are presented on the proverbial silver platter. It also properly handles parsing collections with backslash escaped characters.
I was about to add a reminder that the limitations should be documented, like not supporting very magic patterns. This approach also supports those for free. For a user not supporting \v patterns makes no sense, because there is no satisfactory answer at the feature level.
I think the normalised format of the compiled pattern is a clear winner.
The analysis may also be useful in other contexts unrelated to syntax highlighting and while I suppose that could be executed in the syntax file and presented independently it's certainly not a natural organisation.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Thanks, but I disagree and want to keep the analysis on the source pattern.
We agree on performance: at equal completeness both routes produce the same
filter and the same :syntime, so the prototype's 1.33x vs 1.47x was bail-driven,
not a source-vs-compiled effect. That reframes the question rather than settling
it — what matters is the cost of maintaining that completeness and which grammar
the analyzer tracks. That's a maintenance argument, and it favours the source:
The source grammar is public, documented (:help pattern) and backwards
compatible. The bytecode is an internal representation: no stable contract,
unversioned, already split across BT and NFA. Deriving the set from the program
couples a user-facing feature to today's BT opcode layout, and an engine change
(new opcode, different BRACE_*/composing emission, an NFA-only construct)
breaks the prefilter silently, with no compiler error to catch it.
"Normalised values on a silver platter" cuts both ways: that normalisation
lives inside the engine, out of our control. On the source side the min/max,
magic-level and collection handling are in our code, where a change is one we
make and test.
The "second grammar" concern applies to both routes — a bytecode walk is also a
second interpreter, tracking an undocumented grammar instead of a stable one.
It's not hypothetical: the prototype hit two soundness bugs straight from
reading the bytecode (\%#=N raised E1281; inline \c was ignored, so
\cWORD required WORD and skipped a line with word).
Dropping \{ was never only the min/max swap — it's the bail policy: don't
interpret a construct whose soundness isn't self-evident. Recovering it from the
bytecode doesn't make it self-evident, it just rides on the engine's current
normalisation. The ~1.4x is a deliberate trade: a smaller, sound gain with no
permanent dependency on internal layout. We can extend the source analyzer later
if a construct becomes self-evidently sound.
So I'd like to keep the source-level analyzer with bail-by-default. I have no
wish to maintain — or to use — a fragile, glass-like Vim that is a burden to keep
working.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The continuing suggestion that the BT bytecode is some undocumented, unreliable, and ever shifting morass is simply ahistorical, at best. It has survived the better part of forty years without modification, beyond additions, and survived the introduction of a brand new engine.
The BT engine is the reference implementation. There's nothing "split across two engines", the two engines are contractually identical. Anything else is a bug.
Dropping { was never only the min/max swap — it's the bail policy: don't
interpret a construct whose soundness isn't self-evident. Recovering it from the
bytecode doesn't make it self-evident, it just rides on the engine's current
normalisation.
I have no idea what this actually means. This "self-evident" bail policy isn't engineering rigour, it's a purely subjective judgement the consistency of which can't be maintained. What exactly is unsound or not self-evident in handling a bounded quantifier? In the bytecode analysis it's two boolean tests against the engine's already-decoded bounds. Riding on the engine's normalisation isn't some defect, the engine defines what \{ matches. The new hand-rolled parser is riding on an interpretation of the incomplete documentation. It missed the corner case, passed testing, and then rather than fixing the bug just dropped it for now falling below the nebulous bail out threshold.
The soundness of the source parser itself isn't self-evident either, it's confirmed by testing, not by inspection.
The optimiser is hopefully invisible with respect to correctness, but its performance quirks are leaking. You can't reliably reason about the performance because its work essentially turns on accidental pattern spellings. It is now in the indefensible state of working its magic on foo but not \vfoo, and on \+ but not \{1,}, to list just two examples. Performance tuning syntax patterns is already arcane enough, but now we'll also have to deal with recommending that users avoid very-magic patterns. We'll have to suggest experimenting with replacing quantifiers and a host of other tweaks to work around the arbitrary, undocumented bail policy. An optimiser that can't reliably process Vim's own regex patterns without this many jagged edges isn't something we should be shipping.
You seem to be presenting this as fixed in stone once it's merged, but it likely won't be. There are other maintainers and countless contributors who will want to improve the optimisations once they stumble across this implementation. This area of optimisation is well understood, so others will want to add the usual improvements, and their bail out threshold will not match yours.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Fair points, and let me concede the ones the prototype settles. The program
walk is sound on everything we have tested — synID() is byte-for-byte
identical with it on and off — and on a couple of constructs it genuinely covers
more than the source walk: because it analyses the compiled pattern, \v and
collections with escaped characters arrive already normalised, where the source
walk currently bails. Credit where it is due; I am not claiming the bytecode is
a shifting morass.
On performance and effort, a falsifiable test beats trading my numbers. The
experiment branch is already pushed: a diff on it that extracts a sound
required-byte set from \{n,m} — your "two boolean tests against the decoded
bounds" — and comes out ahead of the source analyzer on the 25k-line C sweep
with synID() unchanged would settle it directly, and if it is that cheap it
should be a small patch. As it stands the prototype is the opposite of free: it
bails on BRACE_*, MATCH, MULTIBYTECODE and others, measured slower as built
(~1.33x vs ~1.47x), and reading the program needed its own per-construct
correctness work — two soundness bugs during development (\%#=N → E1281;
inline \c ignored).
But I want to be straight that this would not move my decision, and why. The
question was never whether the program walk can be made sound — it can. It is
whether a documented, user-facing feature should derive its behaviour from an
undocumented internal representation when the documented :help pattern grammar
gets us there at equal or better speed. That is a maintainability judgement, not
a correctness one, and on that axis I will keep the analysis on the source
grammar even if the compiled route is made faster.
It is not fixed in stone either way. The durable contract is the on/off
differential test: any decline is safe by construction — a smaller required set
can only make the filter skip fewer lines, never the wrong one — so coverage can
be widened by anyone and the test catches an unsound over-approximation
immediately, whatever their threshold. That guardrail is what makes it safe to
extend, and it is why I would rather extend code that tracks the stable,
versioned grammar than the BT program.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The lead-byte prefilter now handles \{n,m} multis instead of bailing out on them.
Previously syn_fa_concat() gave up on any pattern containing a \{n,m} repeat and fell back to a full match for the whole line. The brace-limit parsing has been extracted out of read_limits() into a shared regpat_get_brace_limits() (in regexp.c), which works purely on the pattern syntax and is independent of any compiled program, so both the regexp engine and the prefilter can use it.
The prefilter uses the parsed limits to decide whether the preceding atom is optional: it is optional when the smaller bound is zero (e.g. \{0,3}, \{-}), and mandatory otherwise (like \+). The same normalisation — default min of zero, default max of a very big number, and - range reversal — is shared with the engine, so there is a single source of truth for the \{...} grammar.
No behavior change for the regexp engine itself; read_limits() is now a thin wrapper that keeps its existing error reporting.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The required-byte prefilter now handles multibyte literals instead of bailing on them.
Previously syn_fa_atom() gave up on any byte >= 0x80, so a pattern containing a multibyte literal was never prefiltered: its required-byte set stayed empty and every line was treated as a candidate.
The line byte set is built byte-by-byte over the raw line, so requiring all bytes of a multibyte literal is sound: the character can only match when each of its bytes occurs in the line. Those bytes are now added to the required set. Under ignore case a different-case form may match instead, so the bytes are only required when the character has no case variants (utf_fold() for UTF-8, the case test for a single high byte, and skipped for DBCS where invariance cannot be proven cheaply).
This is purely additive and behavior-preserving, and inherits the same 'encoding'-stability assumption that the compiled sp_prog already has.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
The required-byte prefilter now handles \c/\C and \V instead of bailing on them.
\c/\C anywhere force case (in)sensitivity for the whole pattern. They are resolved up front by scanning the pattern (per character, so an escaped backslash or a DBCS trailing 0x5C is not mistaken for a toggle), with \c winning over \C exactly as the engine resolves the flags; the analyzer then runs with that ic.
Under \V (very nomagic) every byte up to the next backslash is a literal, so all of those bytes are required. A \|/\&/\) there is a branch/group boundary left to the caller; any other escape (a \-quantifier, a magic reset, ...) introduces constructs not modeled here, so it bails. This stays sound in every direction: the quantifier/escape cases bail rather than over-claim, and losing the very-nomagic state across \| only ever drops required bytes (the branch is then read in default magic, which marks more characters special), so the computed set is always a subset of the truly-required bytes — it can only skip fewer lines, never wrongly skip one.
The single-byte/multibyte literal handling is factored into one helper and reused for plain literals and the \V run. Verified by an on/off equivalence test (including adversarial \Vfoo\? / \Vab\{0,2}c cases that would drop a match if the run were over-required) and the full runtime/syntax screendump suite. On a \V/\c-heavy workload (120 patterns) it lifts the prefilter from ~1.0x (everything bailed before) to ~1.74x; \V is particularly strong since its literal-heavy patterns yield dense required-byte sets.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
So let's start with documenting what this is for and when to use it. Can you please add this part here at the end of :h syntax.txt
*:syntax-prefilter*
To avoid wasted work Vim uses a "required-byte" prefilter. When a syntax
pattern is defined, Vim works out which bytes must appear in any text the
pattern can match. Before trying the pattern on a line it checks the line for
those bytes and skips the pattern when one of them is missing, without running
the regexp engine at all. This is why including literal text, as above, helps:
it gives the prefilter something to key on.
This only makes highlighting faster, it never changes the result. The
highlighting is the same whether a pattern is skipped or not.
The prefilter applies only to the patterns of |:syn-match| and |:syn-region|
items. It does not affect matching functions such as |match()|, |search()| or
|substitute()|.
The analysis is deliberately careful: it speeds up only the patterns it can
analyse with certainty and does nothing for the rest. Which patterns benefit
therefore depends on how a pattern is written; two patterns that match the same
text are not necessarily optimized the same way. There is no option to
configure this and nothing to tune.
Because a skipped pattern is not tried, it is not counted in a |:syntime|
report for the lines where it was skipped, and may be absent from the report
entirely if it was skipped on every line that was redrawn.
If you suspect the prefilter causes wrong highlighting, you can turn it off to
check, with |test_override()|: >
:call test_override('syn_prefilter', 1)
:syntax sync fromstart
:redraw!
<Switch it back on again, together with any other overrides, with: >
:call test_override('ALL', 0)
<If the highlighting changes when the prefilter is off, that is a bug worth
reporting; the result is meant to be identical either way.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
So let's start with documenting what this is for and when to use it. Can you please add this part here at the end of :h syntax.txt
Added. Thanks!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
Squashed into a single commit and updated the commit message and PR description to reflect the latest changes.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()
I poked about (with build 1343c1be6a) a few other filetypes
and measured elapsed time with the offered script and found
outliers among them. On some syntactically valid input,
deactivating the byte filter occasionally improves syntax
rendering performance, else the reported speedup using it is
negligible for the same input:
SECURITY.md,runtime/syntax/generator/README.md,.github/ISSUE_TEMPLATE/feature_request.md).runtime/tools/mve.awk andruntime/doc/maketags.awk; but contrast it withruntime/doc/makehtml.awk).screenrc file (see #20550)Fetch this branch.
cp runtime/syntax/testdir/input/screen.screenrc /tmp/screenrc cd src/testdir head bench.vim (p=../../runtime/syntax/testdir/input; \ for f in /tmp/screenrc $p/brainfuck_hello.b $p/whitespace_hello.whs $p/srt_sample.srt ../../SECURITY.md ../../runtime/syntax/generator/README.md ../../.github/ISSUE_TEMPLATE/feature_request.md ../../runtime/doc/maketags.awk ../../runtime/doc/makehtml.awk ../../runtime/tools/mve.awk; \ do \ echo "$f" >> /tmp/bench.log; \ for k in off on; \ do \ VIMRUNTIME=../../runtime BENCH_FILE="$f" BENCH_OUT=/tmp/bench.log BENCH_MODE="$k" ../vim --clean -S bench.vim; \ done; \ echo >> /tmp/bench.log; \ done) vim /tmp/bench.log
To somewhat complement our tooling for measuring coarse
elapsed time, I also made use of another, more horrendous
device we have for syntax tests -- logging busy-loop counts
it takes to scroll a screenful of lines ("page") of a test
file before taking a snapshot.
export VIM_SYNTAX_TEST_WAIT_TIME=1 export VIM_SYNTAX_TEST_LOG=/tmp/test.log touch -a "$VIM_SYNTAX_TEST_LOG" cd runtime/syntax make clean brainfuck srt whitespace test VIM_NO_SYN_PREFILTER=1 make clean brainfuck srt whitespace test
Despite the lack of precision, loop totals corroborate
reltime figures for speedups and slowdowns: there can be
less accumulated loops (on an idle machine) for file inputs
that benefit from using the filter and the file under test
is scrolled quicker.
As can be seen from perusing reports, applying the filter on
certain combinations of valid input and syntax rules may not
be that desirable in its current form. (Do we need to always
populate the byte set for a line (current_line_bset) if we
can disable the filter at will?)
We lack cost reports of using the filter in other common
workflows:
:syntax sync fromstart away from the:syn-sync-minlines etc. settings for surroundingsyntax/testdir/README.txtIf it turns out that the filter is most cost-effective for
certain workflows then only such identified workflows should
have it. (Hence my suggestion to treat any potential opt-in
flags as hints.)
It is fine to use reltime for prototyping the filter to see
if the work worth pursuing, but we have :syntime as intended
API for syntax-related measurement, and incurred overhead
from preparing and using the filter should also tally with
reports.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you are subscribed to this thread.![]()