Syntax highlighting performance adversely affected by inefficient use of in_id_list()

26 views

Skip to first unread message

Brett Stahlman

unread,

Jun 2, 2023, 10:18:38 AM6/2/23

to vim_dev

Some time ago, while testing various configurations of a plugin that makes very heavy use of syntax regions, I noticed that in certain configurations, redraws were timing out after the default 2 second 'redrawtime'. In an attempt to understand why, I profiled Vim and discovered that much of the time was spent in the in_id_list() function, which was called over 115 million times in the space of a few seconds. Shocked that any function could need to be called so frequently, I began digging around in the Vim source to understand what was happening. What I discovered was that in_id_list() is used in a very inefficient strategy for finding all syntax groups in the current group's 'contains' or 'nextgroup' list. The calls are made within syn_current_attr() in a loop over *all* syntax items, and because in_id_list() performs a linear search of the 'contains' or 'nextgroup' list for each iteration of this outer loop, the cost of the algorithm is O(M x N), where M is the number of defined syntax items, and N is the size of the current item's 'contains' or 'nextgroup' list. Moreover, this nested loop is executed each time the current syntax item changes, so buffers with lots of syntax regions in a small screen area will be hit especially hard.

This approach seemed needlessly inefficient. Clauses like 'nextgroup' and 'contains' provide constraints that ought to make finding next/contained groups faster, but the current implementation doesn't take advantage of this. Also, the searches of these lists are linear, and thus, slower than they could be. As a proof-of-concept, I prototyped an algorithm that attempts to "invert" the following loop within syn_current_attr():

for (idx = syn_block->b_syn_patterns.ga_len; --idx >= 0; ) { ... }

Under the prototyped approach, iteration is controlled by an inline stateful iterator function called syn_current_attr_loop_iter(), which yields the next syntax item to check each time it's called. The function operates in several modes, which govern how the list of candidate items is generated. At the fast end of the spectrum is the "IDS" mode, which iterates over syntax groups and clusters in the current item's 'contains' or 'nextgroup' list; at the slow end of the spectrum is the "SLOW" mode, which iterates over *all* syntax items, just like the original implementation on master. In between the two extremes are modes "IDXS" and "CTDIN": the former handles virtual groups like TOP and CONTAINED; the latter ensures we check items that aren't in a 'contained' list, but mention the current group in their own 'containedin' list. In certain scenarios, special iterator logic ensures "fall back" to "CTDIN" mode after all the groups found by "IDS" mode have been processed. Partially sorted, denormalized growarrays have been added to support some of this, and the search within in_id_list() has been changed from linear to binary. Although I didn't do exhaustive testing on my prototype, I did see a significant performance improvement with no loss of functionality on a very complex syntax. For instance, a single redraw that had taken 3.38 seconds on master took only 0.96 seconds on the prototype branch. The performance increase was even more dramatic before I used the information gleaned during my investigation to optimize my plugin's syntax definition. No doubt, developers more familiar with Vim's syntax engine could achieve much greater performance gains. The handling of 'containedin' could definitely be improved. In my prototype, I consider all items with a 'containedin' clause to be candidates for inclusion in the current group; however, it should be possible to create additional denormalized data structures representing the reverse associations (container => containee) and update them whenever a 'containedin' clause is processed. Complicating this approach is the fact that the containing group's syntax items may not yet exist when the corresponding 'containedin' clause is processed, so there would probably need to be a mechanism for expanding placeholders at the point of first use.

For anyone interested, the prototype is on branch "invloop" of the following (Vim 8) fork:

https://github.com/bpstahlman/vim.git

The plugin I used for benchmarking and testing is Txtfmt.

https://github.com/bpstahlman/txtfmt

This plugin is a natural stress test because it creates *many* syntax regions (thousands in some configurations) and its test page creates lots of regions in a single screen.

Thanks, Brett S.

Bram Moolenaar

unread,

Jun 2, 2023, 3:13:12 PM6/2/23

to vim...@googlegroups.com, Brett Stahlman

Brett Stahlman wrote:

> Some time ago, while testing various configurations of a plugin that makes
> very heavy use of syntax regions, I noticed that in certain configurations,
> redraws were timing out after the default 2 second 'redrawtime'. In an
> attempt to understand why, I profiled Vim and discovered that much of the
> time was spent in the in_id_list() function, which was called over 115
> million times in the space of a few seconds. Shocked that any function
> could need to be called so frequently, I began digging around in the
> Vim source to understand what was happening. What I discovered was
> that in_id_list() is used in a very inefficient strategy for finding
> all syntax groups in the current group's 'contains' or 'nextgroup'
> list. The calls are made within syn_current_attr() in a loop over
> *all* syntax items, and because in_id_list() performs a linear search
> of the 'contains' or 'nextgroup' list for each iteration of this outer
> loop, the cost of the algorithm is O(M x N), where M is the number of
> defined syntax items, and N is the size of the current item's
> 'contains' or 'nextgroup' list. Moreover, this nested loop is executed
> each time the current syntax item changes, so buffers with lots of
> syntax regions in a small screen area will be hit especially hard.

[...]

Thanks for looking into this. Although your use case appears to be very
specific. I was planning on taking another look at syntax highlighting
mechanisms, but several other things will come first, thus it might be a
while before I get to it. That is going to be in the context of ideas
to use treesitter and TextMate. After making changes for that
performance probably has to be tuned again.

The syntax engine is very complex. Making changes always has the risk
of introducing bugs. There is not that much testing at the moment.
Ideally every syntax plugin comes with at least a minimal test. I know
some authors have their own tests, which is fine to check the plugin
before sending out a new version. But it is not useful for checking the
implementation. Also, every author has to reinvent the wheel.

It would be very helpful to start by adding a few syntax plugin tests,
with a generic way to do this. Once we have the setup and a few
examples, we can ask others to create more tests. Using screendump
tests would be the simplest and most comprehensive. But executing them
would be a bit slow. We can run them separately from "make test" to
avoid slowing everything down. The indent tests can function as a rough
example, have a "syntax.ext" file as input and a "syntax_ext.dump" file
as a reference output. Something has to be added for variants with
settings.

Once we have some number of tests like this I would feel more confident
in changing the code to optimize for performance. The tests could also
output timing to be able to compare before/after performance. Comparing
times only works locally, since every setup is different.

--
hundred-and-one symptoms of being an internet addict:
96. On Super Bowl Sunday, you followed the score by going to the
Yahoo main page instead of turning on the TV.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Brett Stahlman

unread,

Jun 2, 2023, 9:54:08 PM6/2/23

to vim_dev

No problem. Understood.

The syntax engine is very complex. Making changes always has the risk
of introducing bugs. There is not that much testing at the moment.
Ideally every syntax plugin comes with at least a minimal test. I know
some authors have their own tests, which is fine to check the plugin
before sending out a new version. But it is not useful for checking the
implementation. Also, every author has to reinvent the wheel.

It would be very helpful to start by adding a few syntax plugin tests,
with a generic way to do this. Once we have the setup and a few
examples, we can ask others to create more tests. Using screendump
tests would be the simplest and most comprehensive. But executing them
would be a bit slow. We can run them separately from "make test" to
avoid slowing everything down. The indent tests can function as a rough
example, have a "syntax.ext" file as input and a "syntax_ext.dump" file
as a reference output. Something has to be added for variants with
settings.

I think this would be extremely useful. I've definitely wanted this sort of capability to validate changes made to my own plugin. When I wrote the initial version years ago, I went a bit crazy with configuration options. Now, with so many option permutations, it's easy to add a feature that works properly for the most common configurations but breaks one of the corner cases. I searched the Vim scripts site some time ago looking for some sort of test framework with a descriptive syntax for test specification. I wasn't even aware of the screendump API before today. I'm assuming I could use this mechanism to test my own filetype and syntax plugins, provided I maintained my own 'testdir/dumps' locally.