[vim/vim] [proposal] Can we introduce TextMate grammar system for syntax highlighting ? (Issue #9087)

494 views
Skip to first unread message

Linwei

unread,
Nov 4, 2021, 1:24:23 AM11/4/21
to vim/vim, Subscribed

Current problem

The current syntax highlighting system is very slow, and there are noticeable lags when scrolling large C++ files which contain complex syntax elements.

Issues of using a separated process

Previously, most people suggest something like nvim-treesitter which will analyze source code in a background treesitter process and render keywords in the foreground with text-property.

But is it a good idea? I don't really think so,
at least 4 disadvantages for treesitter solutions:

  • power consumption: an extra background job is required, causing less battery life and more carbon dioxide.
  • buffer sync: code must be written very carefully to guarantee that the source code in two processes (vim/treesitter) are the same, coc/nvim-treesitter sends the whole buffer to the background every time changetick increase to prevent such things, which is a little flaky.
  • reliability: an external program installed by the user is not reliable enough, they are plenty of errors in version compatibilities and environment errros, people need to take extra efforts to get syntax highlighting work.
  • poor parse quality: treesitter is a great project, but language parsers are contributed by people all over the world and their quality is not under control (performance issues or inaccurate results in certain languages).

Background syntax highlighter is still immature, there are still many other strange issues in nvim-treesitter:

https://github.com/nvim-treesitter/nvim-treesitter/issues

If we introduce something like this, we shall take all these issues into account.

TextMate grammar system

Syntax highlighting is the most important part of an editor, better not rely on any uncontrollable external programs.

We need some new things that can satisfy such goals below:

  • good performance
  • robust and reliable
  • accuracy
  • low power consumption
  • work in the same process (not require external programs)

And TextMate's grammar engine is really a good candidate which is widely used in many IDE/editors, including vscode (see syntax-highlight-guide for details), sublime and many others.

VS Code uses TextMate grammars as the syntax tokenization engine. Invented for the TextMate editor, they have been adopted by many other editors and IDEs due to the large number of language bundles created and maintained by the Open Source community.

TextMate grammars rely on Oniguruma regular expressions and are typically written as a plist or JSON. You can find a good introduction to TextMate grammars here, and you can take a look at existing TextMate grammars to learn more about how they work.

The grammar can be defined in JSON, that means can be translated into viml or just plain JSON files.

Possible Solution

We can specify which grammar engine to use for the given buffer:

  • default engine: current vim's regex grammar
  • textmate engine: textmate grammar system.

And some new command can be used to change grammar engine:

:syntax grammar textmate
:syntax grammar default
:syntax load ~/.vim/syntax/cpp.json

for example, the snippet below can be included in the head of syntax files:

if has('textmate')
    syntax grammar textmate
    syntax load syntax/cpp.json
    finish
endif

....

And lots of existing vscode/textmate syntax files can be reused with minimal modification.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

Bram Moolenaar

unread,
Nov 4, 2021, 7:18:42 AM11/4/21
to vim/vim, Subscribed

Thank you for starting this discussion. I had a vague plan to look into integrating treesitter, it is good to know it also has disadvantages. Vscode is widely used, thus if it uses TextMate then there must be something good about it.

Comments welcome.

Christian Clason

unread,
Nov 4, 2021, 8:59:40 AM11/4/21
to vim/vim, Subscribed

I'll just comment that I would take these comments about tree-sitter with a significant heap of salt.

Björn Linse

unread,
Nov 4, 2021, 9:19:33 AM11/4/21
to vim/vim, Subscribed

The might be some misunderstanding here. Tree-sitter in neovim doesn't use an external process like coc.nvim. The parser runtime is a C library embedded into the editor itself (in total not more LOC:s than syntax.c + highlight.c in vim itself), and parses the buffer in memory and produces a syntax tree that in-process plugins can use (for highlighting but also for other purposing like text objects).

Imran H.

unread,
Nov 4, 2021, 1:09:40 PM11/4/21
to vim/vim, Subscribed

Right now the biggest problem with syntax highlighting is how inconsistent and unpredictable it is. An unified interface will be more than worth the effort.

TextMate will probably be better for keeping the syntax system more integrated & backwards compatible than using something like treesitter. Also the modular and overengineered plugin architecture of treesitter would be a huge departure from the way it is done right now, so we should be a little cautious about how much functionality to reimplement.

bfrg

unread,
Nov 5, 2021, 9:17:48 AM11/5/21
to vim/vim, Subscribed

@bfredl How much longer does it take to load a larger file like src/evalfunc.c in Neovim when tree-sitter is enabled, compared to the default syntax highlighting? I'm assuming that default syntax highlighting is disabled for filetypes where tree-sitter is supported.

Björn Linse

unread,
Nov 5, 2021, 10:00:02 AM11/5/21
to vim/vim, Subscribed

@bfrg src/evalfunc.c from vim (10 000 lines) takes 80 ms more time with tree-sitter enabled for the initial parse (200ms compared to 120ms in my config)

mg979

unread,
Nov 6, 2021, 8:45:55 AM11/6/21
to vim/vim, Subscribed

treesitter is more than just syntax highlighting, it's also useful for text objects for example.

TextMate system is old, Sublime Text has been mentioned but it left it years ago to use its own syntax engine. Does it make sense to adopt a system that is already waning? And how big is its library if it must be included?

Also when saying that a system is more performant, some source/benchmark should be provided. Is it TextMate more performant than treesitter? Who says so?

Linwei

unread,
Nov 6, 2021, 1:10:27 PM11/6/21
to vim/vim, Subscribed

@bfredl , thanks for figuring it out, and I made a new revision:

list of tree-sitter disadvantages:

  • power consumption: yes tree-sitter is powerful, it will generate AST in real-time, but I am just talking about syntax highlighting, not textobj or indentation. AST generation has its price (see nvim-treesitter/nvim-treesitter#1292).
  • reliability: nvim-treesitter need to load an external shared library as the parser for each language, the shared library must be downloaded and compiled into .so files (I know :TSInstall can simplify these steps), building progress can break if gcc/clang is not installed, the plugin or neovim itself may break due to any common dynamic link library problems, eg: version incompatible when the plugin has updated but parser .so files not, dependency conflict when loading the shared library.
  • poor parser quality: tree-sitter is a great project, but language parsers are contributed by people all over the world and their quality is not under control (performance issues or inconsistent behavior in different languages).

The biggest risk is parser quality, over 100+ open issues for parsers:

examples for inconsistency:

example for performance:

The parser quality problem is totally out of control, nearly impossible for us to fix all the parsers one by one.

Linwei

unread,
Nov 6, 2021, 2:02:07 PM11/6/21
to vim/vim, Subscribed

@mg979 the core part of textmate syntax system is oniguruma, which is open source and well maintained by the community.

known editors / ides using textmate syntax system:

  • vscode
  • textmate itself
  • eclipse
  • jetbrains

Monarch was initially built to support languages in VS Code. Then, they decided to switch for TextMate as well because of reasons listed here: microsoft/vscode#174 (comment) .

Some details:

VS Code's tokenization engine is powered by TextMate grammars. TextMate grammars are a structured collection of regular expressions and are written as a plist (XML) or JSON files. VS Code extensions can contribute grammars through the grammar contribution point.

The TextMate tokenization engine runs in the same process as the renderer and tokens are updated as the user types. Tokens are used for syntax highlighting, but also to classify the source code into areas of comments, strings, regex.

Starting with release 1.43, VS Code also allows extensions to provide tokenization through a Semantic Token Provider. Semantic providers are typically implemented by language servers that have a deeper understanding of the source file and can resolve symbols in the context of the project. For example, a constant variable name can be rendered using constant highlighting throughout the project, not just at the place of its declaration.

Highlighting based on semantic tokens is considered an addition to the TextMate-based syntax highlighting. Semantic highlighting goes on top of the syntax highlighting. And as language servers can take a while to load and analyze a project, semantic token highlighting may appear after a short delay.

it is easy to implement textmate syntax highlighting

The tokenizer of vscode/textmate is:

And here is the wrapper in javascript, it's neatly written and not hard to understand:

All we need to do is rewriting the javascript wrapper in C,

image

And thousands of textmate syntax files are ready to use.

lacygoill

unread,
Nov 6, 2021, 3:04:02 PM11/6/21
to vim/vim, Subscribed

No more than 4854 lines (including comments) in javascript/typescript

Tests excluded, it's 3779 lines of code (source: cloc(1)).
Tests included, it's 5074 lines of code.

lacygoill

unread,
Nov 6, 2021, 3:05:46 PM11/6/21
to vim/vim, Subscribed

Why not Sublime grammar instead of TextMate grammar? It seems more powerful, and easier to read.

I think .sublime-syntax is more easy to write and readable.

source

Sublime text 3 has implemented a new grammar format that seems much better than the traditional textmate grammar.

source

Is it because there have been fewer .sublime-syntax files written than .tmLanguage ones? Is there a licensing issue with these files?

Linwei

unread,
Nov 6, 2021, 3:12:13 PM11/6/21
to vim/vim, Subscribed

@lacygoill maybe textmate grammar is a little easier ? because there are reference implementations:

But sublime is closed source ? we need write it from scratch ??

lacygoill

unread,
Nov 6, 2021, 3:31:20 PM11/6/21
to vim/vim, Subscribed

But sublime is closed source ? we need write everything from scratch ??

Good point. I forgot that sublime was closed source.


Is TextMate much better (readibility, reliability, performance) than our current syntax highlighting mechanism?

Just for TypeScript alone, there have been 754 reported bugs, 41 remaining open currently.

Assuming we support TextMate, what would happen to our current issues related to syntax highlighting? Do we close them, and tell their authors to use the new syntax highlighting mechanism? If the users find issues in TextMate grammar files, do we accept their reports on this bug tracker? IOW, is it going to help reduce the number of remaining open issues here?

Linwei

unread,
Nov 6, 2021, 3:48:25 PM11/6/21
to vim/vim, Subscribed

Because TypeScript is a new language that evolve quickly ?

Oniguruma + json like config is certainly faster enough than current vim's mechanism. People seldom encounter performance issues in syntax highlighting when using textmate/vscode/eclipse/jetbrains.

Sublime's grammar seems more readable and powerful than textmate, maybe oniguruma+config can achieve such thing.

lacygoill

unread,
Nov 6, 2021, 6:32:17 PM11/6/21
to vim/vim, Subscribed

I remember an issue where Vim was very slow when adding/removing text properties on CursorMoved. It only occurred while the syntax highlighting was enabled. So, one might think that the latter was the culprit. It turns out that the syntax highlighting was fine; the issue was Vim redrawing the screen too much.

With regards to how people perceive the current syntax highlighting as being too slow, I wonder which part of the issue comes from the syntax highlighting itself, and which part from something else like (too much redraw).

People seldom encounter such issues in syntax highlighting when using textmate/sublime2/vscode/eclipse/jetbrains.

That's interesting. I hope it's really thanks to their own syntax highlighting mechanism, and not some other optimizations (like multithreading).

mg979

unread,
Nov 7, 2021, 6:43:14 AM11/7/21
to vim/vim, Subscribed

A couple of remarks:

  • with vim system it's really easy to add custom groups to extend current syntax in after/syntax, would it be possible to do that with TextMate as well?
  • as @lacygoill said, there could be other bottlenecks (too much redrawing), that would limit TextMate performance in the same way, isn't it better to investigate those first?
  • programs with GUI use multithreading and this surely helps them
  • sometimes a slow syntax highlighting depends on how (bad) the syntax script is written (for example default vimscript syntax is obscenely slow), and it would be faster with some changes in the script

I think performance of vim syntax highlighting could be improved before trying alternatives, for example:

  • there are known problems with folding, it would help to fix those
  • how much of the syntax is recalculated in insert mode? I think only the part of text from the insertion point up to the last visible line in the window should be recalculated, is this the case or does vim do a full update on every keystroke?

Stephan Seitz

unread,
Nov 8, 2021, 7:12:29 PM11/8/21
to vim/vim, Subscribed

I want to add that we currently have no safe-guards for tree-sitter that are applied for regex-based highlighting like limiting the line number or doing background parsing like Atom would do.

Background syntax highlighter is still immature

I think background syntax highlighting (if you refer to asynchronous or separate threads highlighting) is neither implemented for tree-sitter nor for traditional vim highlighting. The possibility to make a fast thread-safe copy of the parsing state for tree-sitter or any other kind of multithreading is not used at the moment in Neovim.

Many of the issues you cited complained about features missing due to missing :h syntax. It will always be difficult to transition from one syntax system to another especially when it is so widely supported like vim syntax/fold/indent files. Maybe it would be easier to maintain more compatibility with a system that works more similar.

About quality of the grammars, you surely have different trade-offs. VS-Code has significant more users than Atom and Nightly-Neovim. Tree-sitter parses the whole document which can help with complex syntax constructs and large-scale structure. However, it will easier get confused when it sees something that cannot be handled be the language grammar (preproc-constructs or non-standard language extensions) while regexes with a more local view are often still ok. The error recovering capabilities vary a lot on how the concrete grammar is written. Tree-sitter provides something in-between regex highlighting and LSP-like semantic highlighting, so it might not be necessary if the two latter are available for a language. Distributing binary is another challenge for tree-sitter. Arbitrary code execution through custom scanners enables highest flexibility but may also pose a security risk though if the parsers are not self-generated and the scanner code is not reviewed.

Andrey Mishchenko

unread,
Nov 9, 2021, 2:55:47 PM11/9/21
to vim/vim, Subscribed

For those who haven't seen it, this is an excellent introduction to Tree-sitter, by the author: https://www.youtube.com/watch?v=Jes3bD6P0To&ab_channel=StrangeLoopConference

tl;dr: Tree-sitter is a (portable, dependency-free) C library which (conceptually) takes a grammar (expressed in JavaScript) and a source file, and returns a parse tree for the source file with respect to the grammar. The big selling point is that TS (claims that it) can handle syntax errors well (still return a reasonable parse tree) and that it is incremental (returns new parse trees efficiently/quickly given some code edits and previous trees).

Parsers for different languages are provided by the community and while I haven't seen this first-hand, I find it easy to believe that many of them are not great. But the project is much younger than TextMate, and GitHub uses it for its on-web syntax highlighting so there might be some corporate support there.

Personally, the thing I would be most excited about seeing is Vim exposing a representation of the syntax tree which can be used not just for syntax coloring but also for semantic editing (expand visual selection one AST node up, copy function body, etc.). IDK how well the Vim architecture supports this today. But in theory you could then plug in whatever parse-tree-generator you choose (Tree-sitter or TextMate).

If you are using an LSP language server, it's true that the LS can give you a parse tree (on which is even more accurate, esp. in the case of context-sensitive grammars like C++), but language server (which Vim also doesn't natively support yet) will always be slower (it will do more than a parser, for example it will resolve cross-file deps and so on) and therefore will have to be async and higher-latency. So I think there is room for both a fast incremental parse system (like Tree-sitter) and LSP support (for things like go-to-definition and find usage).

See also this discussion in the VSCode repo: microsoft/vscode#50140

fcurts

unread,
Nov 19, 2021, 2:14:19 PM11/19/21
to vim/vim, Subscribed

As someone who has spent months writing and maintaining TextMate and tree-sitter grammars for real-world languages, let me tell you that the TextMate grammar system is totally broken, at least from a 2021 perspective. TextMate grammars are a nightmare to maintain and impossible to get right. Out of desperation, I even developed my own macro system (just like the authors of TypeScript's TextMate grammar), and it was still a nightmare.

tree-sitter is in a completely different league. It's a top-notch incremental parser that can be used for accurate (!) syntax highlighting, code folding, code formatting, etc. tree-sitter grammars are dramatically easier to write and maintain, and it's actually possible to get them right. GitHub has been using tree-sitter for a while, and VSCode is also starting to use it (see https://github.com/microsoft/vscode-anycode).

Betting on TextMate grammars in 2021 would be an engineering crime.

Imran H.

unread,
Nov 20, 2021, 7:19:14 AM11/20/21
to vim/vim, Subscribed

I am not sure how much of your hyperbolic speech can be deemed accurate, but from what I can see one of the biggest problem with tree-sitter is the general low quality of parsers contributed by different people as pointed out by the OP. "Top-notch" is not the way I would describe it. Which certainly needs to be taken into account as it would require a vast amount of effort to deal with these issues Vim would inherit as a result of undertaking the HUGE project of integrating tree-sitter.

I can't speak for textmate grammar for lack of familiarity. Personally my biggest problem with tree-sitter (at least the way neovim does it) is it's dependency on the environment (gcc/clang), large binary size and the do-it-all mentality which suits neovim but definitely does not feel like the "vim way".

Bram Moolenaar

unread,
Nov 20, 2021, 7:23:34 AM11/20/21
to vim/vim, Subscribed


> As someone who has spent months writing and maintaining TextMate and
> tree-sitter grammars for real-world languages, let me tell you that
> the TextMate grammar system is totally broken, at least from a 2021
> perspective. TextMate grammars are a nightmare to maintain and
> _impossible_ to get right. Out of desperation, I even developed my own

> macro system (just like the authors of TypeScript's TextMate grammar),
> and it was still a nightmare.
>
> tree-sitter is in a completely different league. It's a top-notch
> incremental parser that can be used for accurate (!) syntax
> highlighting, code folding, code formatting, etc. tree-sitter grammars
> are dramatically easier to write and maintain, and it's actually
> possible to get them right. GitHub has been using tree-sitter for a
> while, and VSCode is also starting to use it (see
> https://github.com/microsoft/vscode-anycode).
>
> Betting on TextMate grammars in 2021 would be an engineering crime.

Thanks for your opinion. Making it easier/simpler/better to write a
parser is an important goal. So we should look at the best way to use
tree-sitter. That it compiles each parser into an executable seems like
a disadvantage. Perhaps this is OK for often used languages, but a way
to add a parser at runtime would be really useful.

--
TIM: Too late.
ARTHUR: What?
TIM: There he is!
[They all turn, and see a large white RABBIT lollop a few yards out of the
cave. Accompanied by terrifying chord and jarring metallic monster noise.]
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

jgb

unread,
Nov 20, 2021, 8:08:34 AM11/20/21
to vim/vim, Subscribed

tree-sitter is in a completely different league. It's a top-notch incremental parser that can be used for accurate (!) syntax highlighting, code folding, code formatting, etc. tree-sitter grammars are dramatically easier to write and maintain, and it's actually possible to get them right. GitHub has been using tree-sitter for a while, and VSCode is also starting to use it (see https://github.com/microsoft/vscode-anycode).

If tree-sitter is top-notch, how come an ubiquitous and highly popular language like python is broken in it since quite a while?
When I tested neovim 0.5.1 with tree-sitter I ended up having to disable TS for python (which is the language I use the most) because the indenting and highlighting were unusable. Doesn't exactly inspire confidence.

Christian Clason

unread,
Nov 20, 2021, 8:23:12 AM11/20/21
to vim/vim, Subscribed

I think this discussion is devolving more and more from the purely technical and into prejudices. It is very important here to distinguish

  1. tree-sitter (the engine, which I would agree with @fcurts is an excellent piece of software and fundamentally superior to other syntax engines);
  2. Neovim's integration of tree-sitter, which is still marked "experimental" for a reason (and should be further separated into the fundamental integration and API in core -- which works rather well already -- and its use for syntax highlighting, folding, indentation etc. -- which is very much work in progress);
  3. The individual language parsers (and queries), which are externally maintained.

I think Vim should at this stage focus on 1. to make a reasoned decision (while it of course makes good sense -- and would make me very happy -- to take Neovim's approach and decisions for 2. into account; admitting that the two projects have different needs).

And I find it highly disingenuous to point fingers at 3. while ignoring that the quality of TexMate grammars (and, indeed, Vim's bundled syntax files) varies wildly as well. It's clear that (just like Neovim) you cannot simply switch engines and have to support both (on a per-language basis) for some time until the replacement catches up.

fcurts

unread,
Nov 20, 2021, 9:44:26 AM11/20/21
to vim/vim, Subscribed

I was obviously talking about the engine, which is what matters in the long run. Regarding existing grammars, the difference is that tree-sitter grammars can be improved relatively easily because they can be reasoned about. On the other hand, improving real-world TextMate grammars is anywhere from difficult to impossible. (Often, fixing one problem causes an inexplicable problem somewhere else, which is only discovered later.)

I can't comment on integration aspects. I'm not even a Vim user. But as a language/tooling developer myself, I feel strongly that it's time to move past TextMate grammars, which is why I offered my insights. Good luck!

Stephan Seitz

unread,
Nov 20, 2021, 10:32:58 AM11/20/21
to vim/vim, Subscribed

If tree-sitter is top-notch, how come an ubiquitous and highly popular language like python is broken in it since quite a while?
When I tested neovim 0.5.1 with tree-sitter I ended up having to disable TS for python (which is the language I use the most) because the indenting and highlighting were unusable. Doesn't exactly inspire confidence.

@jgb Indentation has nothing to do with tree-sitter itself. There is a very ad-hoc implementation of using the parsed tree as indentexpr. Python indentation is not working because this implementation just considers the current syntax node you are currently on which is nothing in case of the Python parser because the relevant syntax node ended in the previous line when you start a new one. One would have to add a rule that respects this case or tune the general logic at this point.

You always have to write some system that translates your parsed representation to indents. The quality of this translation says nothing about the quality of the representation itself.

Isopod

unread,
Dec 23, 2021, 10:14:18 AM12/23/21
to vim/vim, Subscribed

As someone who recently spent some time writing a TreeSitter grammar, I have also become less enthusiastic of the project. I watched the author’s presentation a while ago and it sounded like the greatest invention since sliced bread, but in practice it doesn’t always work that well.

The biggest obstacle in my opinion is languages with preprocessors (e.g. C and C++). This isn’t something I had considered initially, but it is simply impossible to parse those languages with TreeSitter because you’re dealing with a language within a language. Now before someone mentions this: I know TreeSitter supports injections, e.g. JavaScript in HTML, but that’s not the same thing because, as I understand, each injection is essentially its own “program”. It’s fundamentally not possible to parse pre-processed languages with a context-free grammar. If you think about it, conditional compilation is as context-sensitive as it gets.

I’m talking about constructs like this:

#if FLAG

  if (foo) {

#endif



  bar;



#if FLAG

  }

#endif

Or this:

#define BEGIN_FUNC void () {

#define END_FUNC }

BEGIN_FUNC

  bla;

END_FUNC

Or this:

#define RENAME(x) renamed_ ## x

void RENAME(my_func) {

  bla;

}

How is TreeSitter supposed to generate an AST for such code if it doesn’t interpret the macros? It’s simply impossible. And often this will result in parse errors. Now, TreeSitter is in theory “fault tolerant”, so it should be able to recover from errors, but I’ve found that it often recovers in a weird, unpredictable way that causes syntax highlighting to be messed up. It gets even worse when we’re talking about using it for features like syntax-aware selections, indentations and folds: Just forget about it.

All TreeSitter grammars for preprocessed languages contain hacks to work around this issue, but they never work 100%. They just handle a few special cases, but blow up in the general case.

The next problem is that parsing is incredibly slow. I benchmarked parsing a 4 MB file and it took over a second. Depending on where you are coming from, that might not sound too bad, but 4 MB a second really isn’t impressive when you consider that modern RAM can handle tens of gigabytes per second. Quite frankly, I’m not sure this “incremental parsing” approach is all that useful when the implementation is so slow in practice. I guarantee I could write a hand-rolled parser that would just reparse the entire file on every edit and it would still be orders of magnitudes faster.

I’ve also found that syntactic highlighting doesn’t actually add that much value over a simple lexer, but it is significantly more complex. Semantic highlighting on the other hand is even more complex, but it also adds a lot of value. If I had to rate the cost-benefit relationship, I’d say: lexer > semantic > syntactic.

If I had to design a syntax highlighting system from scratch, I’d probably just go with a simple C API, something like this:

typedef enum {TOK_IDENT, TOK_STRING, TOK_OPERATOR, ...};

void highlight_tokens(const char *buf, size_t len, Token *tokens, const void *input_state, void *output_state, size_t state_size);

You just pass a chunk of data to the parser and then it returns a buffer with a character class for each character (or maybe an array of ranges, see also LSP for a similar approach). This is the most general form, giving you the greatest amount of flexibility. You could hand-roll a parser, or build one based on regexes or TreeSitter grammars or whatever. It doesn’t restrict you to a particular system.

I’d even consider getting rid of the state persistence stuff and just pass one large buffer containing the entire file and reparse the whole file every time. Because in the general case, you have to do it anyway. Consider putting a comment /* at the beginning of a very large file. No matter what you do, sometimes, you’ll have to reparse everything, so I’m not sure it is even worth adding complexity to save time for only some edits. Better work on making the parser really fast. Computers are fast, it shouldn’t take that long to parse even a 100 MB file. And source files are usually much smaller than this.


Reply to this email directly, view it on GitHub, or unsubscribe.


Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1000367479@github.com>

Linwei

unread,
Dec 23, 2021, 2:10:00 PM12/23/21
to vim/vim, Subscribed

Anyone who eagerly promotes tree-sitter here should answer my questions above first. Repeating its advantages a thousand times do not mean that these fatal problems will disappear.

Tree-sitter is not a new thing, no need to be so excited. Remember that Atom has adopted tree-sitter early in 2018, users in the atom communities are very calm about this "new" feature.

I don't need a better highlighter at the cost of perfomance and flexibility. Because I am suffering performance issues right now and all I want is a fast & static regex-based highlighting.

@lacygoill you claimed in this comment that the problem was caused by "drawing too much".

That's not true, I have done a bisect investigation in this problem here:

And found that there was a big performance regression after 8.0.643 and 8.0.647. You can simply compare syntax highlighting speed difference in both vim 7.4 and the latest vim 8.3.xxxx and you will find that this is by no means a simple "drawing too much" problem.


Reply to this email directly, view it on GitHub.


Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1000481679@github.com>

lacygoill

unread,
Dec 23, 2021, 4:31:30 PM12/23/21
to vim/vim, Subscribed

@lacygoill you claimed in this comment that the problem was caused by "drawing too much".
That's not true,

It is. The patch that fixed my issue only reduced how often Vim was redrawing the screen:

vim9script
diff --git a/src/textprop.c b/src/textprop.c
index b6cae70a8..e74c13849 100644
--- a/src/textprop.c
+++ b/src/textprop.c
@@ -809,6 +809,7 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
     int                id = -1;
     int                type_id = -1;
     int                both;
+    int                is_removed = FALSE;

     rettv->vval.v_number = 0;
     if (argvars[0].v_type != VAR_DICT || argvars[0].vval.v_dict == NULL)
@@ -889,6 +890,7 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
                if (both ? textprop.tp_id == id && textprop.tp_type == type_id
                         : textprop.tp_id == id || textprop.tp_type == type_id)
                {
+                   is_removed = TRUE;
                    if (!(buf->b_ml.ml_flags & ML_LINE_DIRTY))
                    {
                        char_u *newptr = alloc(buf->b_ml.ml_line_len);
@@ -920,7 +922,8 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
            }
        }
     }
-    redraw_buf_later(buf, NOT_VALID);
+    if (is_removed)
+       redraw_buf_later(buf, NOT_VALID);
 }

As anyone can see, the patch did one thing, and one thing only: it put a condition on redraw_buf_later(); the latter can only be invoked if is_removed is true:

if (is_removed)

It did nothing else. And yet, it was enough to fix the issue.


I have done a bisect investigation in this problem here:

Syntax highlighting is extremely slow when scrolling up in recent version (v8.0.1599) #2712

This has nothing to do with my comment. It's an entirely different issue. The only way your comment might be relevant would be if I had written:

whenever Vim is slow, it's because it redraws the screen too much

But I did not say that. And the comment you link did not say that either.

I wrote that in my issue, the cause was too much redraw.
I did not write that in all issues, the cause was too much redraw.


Two last notes before I unsubscribe from this thread.

  1. Asking for questions or clarifications is OK, but saying that I lie is not. I don't want to read anything from you anymore, so I've blocked you.

  2. I don't care whether Vim integrates tree-sitter, TextMate, or whatever software is trending right now. All I care is how reliable Vim is.


Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1000531735@github.com>

Linwei

unread,
Dec 24, 2021, 4:23:34 AM12/24/21
to vim/vim, Subscribed

@lacygoill, Sad to hear that. I have been following you on Github for years, reading your posts in the issues, and studying your early vim9 plugin projects. What I mean was nothing more than "your speculation may be wrong". Complaining that I complained you "lied" was a little overreacting.

You just blocked a faithful follower.


Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1000746236@github.com>

icedman

unread,
Mar 22, 2022, 2:11:10 AMMar 22
to vim/vim, Subscribed

I made the textmate parser portable. Removed the osx foundation codes. It may be worth a test as a vim plugin. Making one is beyond my skillset.

https://github.com/icedman/tm-parser

This library works well on my editor projects, including an ncurses based editor. Works well enough with my Flutter app als. Ashlar Code app for Android (munchyapps.com)


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1074776873@github.com>

Martin Tournoij

unread,
Jun 3, 2022, 12:03:51 PMJun 3
to vim/vim, Subscribed

Last year I wrote a plugin to highlight things with LPeg; I choose LPeg because I liked the way it works, and vis already uses it and there a reasonable number of syntax files already available.

I got bored with it and never finished/published it; I think there were still some remaining issues, but I forgot what they are/were. I think most were related to using text properties to apply highlights, rather than LPeg itself, but not sure. Maybe I'll work on it some more and get it to at least a "publishable/experimental" state.

I also spent quite some time looking at tree-sitter; actually, that was what I originally wrote the plugin for, and came to the conclusion I don't care much for tree-sitter, or at least not for editors. One of the really great features of Vim's current syntax highlighting is that it's pretty easy to modify by users. Based on my experience answering questions on the Vi Stack Exchange people want to do this all the time: they want to highlight some keywords as errors; don't like how this or that is highlighted and want something different, they want to highlight their own project-specific things, etc. Tree-sitter makes that much harder, and I'd consider it a huge UX regression.

Even in "normal" usage there's an entire circus around managing it for end-users; you can't just "drop a file in ~/.vim/syntax/mylanguage.vim" or "~/.vim/after/syntax/mylanguage.vim", you need to compile shared objects with NodeJS and whatnot. The nvim-treesitter plugin manages all of that for you, but a plugin to manage all the circus is putting lipstick on an ugly pig IMO.

I also don't like the way tree-sitter syntax files are written in the first place; other people mentioned that many tree-sitter highlights aren't all that great, and that matches my experience too. My first instinct was "okay, so let's improve this!" but I found that quite hard and gave up after mucking about for a while with very limited success. I think that syntax being hard to write in tree-sitter is probably the reason so many syntaxes aren't so great in the first place. I certainly don't see how tree-sitter is "fundamentally superior to other syntax engines" as someone mentioned in this thread; this seems like some true-ism that keeps getting repeated, but I've seen any reasons why this should be the case (and I did try to find reasons).

Overall I do think the "tree-sitter approach" of more structured parsing is the better approach, I just don't think that tree-sitter is an especially great fit for Vim. I don't know why Neovim went with tree-sitter specifically: as near as I can determine it's just because someone wrote a patch for that – I couldn't really find any discussions about it. Interestingly Neovim does use LPeg internally for some things, I don't know if it was considered – or maybe it was, I very well may have missed some discussions somewhere.


I don't have any opinion on TextMate's system, as I didn't look at it, but when I started working on all of this and evaluating options I wrote down the follow requirements:

  1. Reasonably fast, even for large files, and it doesn't break.

  2. Reasonable easy to modify, including by "normal" users such as sysadmins, scientists (in fields other than comp-sci), and just regular hobbyists who are not professional developers.

  3. Readability and maintenance is important. Right now syntax files are a bit of a "write only, hopefully never read"-affair.

  4. Easy to manage, it should "just work" after dropping a new file in your ~/.vim/ without muckery.

There are a million-and-one parser generators, tools, and so forth out there. It's literally people's entire career to research these kind of things and write tools for them.

Many of then fit requirement 1 ("fast and correct"), but most of them are not especially user-friendly. EBNF (and variants thereof) are more or less the standard for describing languages, but do you really want this as the basis for your syntax highlighting? Probably not.

This is actually a great feature of the current syntax system: you can add, remove, and modify things fairly easy. "I don't like this highlight" or "I want to add a new highlight for X" should be something a fairly experienced dev can do in under an hour. LPeg mostly retains this feature: you can still say "yo dawg, highlight this for me, kthxbye" or "eww, I don't like this, get rid of it!" and be done with it.

Without detailing all the solutions I looked at, I eventually settled on LPeg because of all the solutions I found I felt it had the best combination of correctness and UX.

I still think these are good requirements. It's quite possible there are existing tools out there that do a better job than LPeg, but IMHO tree-sitter very much doesn't.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1146111454@github.com>

Martin Tournoij

unread,
Jun 3, 2022, 12:37:59 PMJun 3
to vim/vim, Subscribed

I put my LPeg plugin over here: https://github.com/arp242/lpeg.vim

Like I said in my previous comment, I haven't worked on it for quite a while, but I did some spot-checking and seems to work fairly decently. Much of it is stolen^H^H^H^H^H^H inspired by vis.


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1146161874@github.com>

Stephan Seitz

unread,
Jun 3, 2022, 1:37:41 PMJun 3
to vim/vim, Subscribed

If anyone wants to have a look on a native (=compiled without runtime dependencies). I would have a look into bat https://github.com/sharkdp/bat. They use a native implementation that reads texmate grammars called synctex https://crates.io/crates/syntect/1.7.1 which is probably good enough to try it out in vim before implementing a C implementation.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1146207788@github.com>

Linwei

unread,
Jul 8, 2022, 8:18:45 AMJul 8
to vim/vim, Subscribed

@theHamsta , sublime grammar is also a good choice:

图片


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1178923307@github.com>

Uriel Acioli

unread,
Aug 8, 2022, 2:08:16 AMAug 8
to vim/vim, Subscribed

tree-sitter's highlights have a lot of quality issues, Syntect and therefore TextMate's grammar in Vim would be a game changer, as least for me. And since Rust has a very good FFI for C, I think it might be a feasible endeavor to integrate Syntect's lib with Vim's.


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1207705596@github.com>

Linwei

unread,
Aug 8, 2022, 6:19:24 AMAug 8
to vim/vim, Subscribed

In spite of tree-sitters poor parser quality, the fatal issue of the tree-sitters highlight is portability.

If we want to encourage people to create diverse syntax highlighting, we must provide something simple, straightforward, and easy to learn for most users.

When we are using text-based grammar files (vim syntax/TextMate/Sublime syntax), it is very easy to make modifications and create a new one. For example, I can change the cpp.vim to a new version to highlight some keywords/rules dedicated to my project or to meet the latest c++ standard if the original author is too busy to update.

While, the tree-sitter's syntax highlighting rule is hard-coded into the parsers, even if you want to make a small change. You are required to change the parsers yourself and build a new .so file for your target platform.

Changing a parser is much more complex than changing a text-based grammar file.

BTW: Tree-sitter is written in rust.

  1. All of our grammar authors are capable of writing rust now?
  2. Everyone has already installed a rust development environment on their computer?

So far as I know, many vim users still don't have a gcc environment to build Vim themself.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1207935408@github.com>

Andrey Mishchenko

unread,
Aug 8, 2022, 7:38:44 AMAug 8
to vim/vim, Subscribed

In case @skywind3000 decides to edit his post: he is boldly claiming that (1) Tree-sitter is written in Rust, (2) you have to write Rust code to create TS grammars, and (3) you cannot change TS highlighting at runtime. These claims are all very false. Since he has shown that he is willing to completely make things up to support his point, anything he says should be taken with a grain of salt.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1208014282@github.com>

Uriel Acioli

unread,
Aug 8, 2022, 7:46:17 AMAug 8
to vim/vim, Subscribed

@skywind3000 not all of them know Rust but since tree-sitter has a C core, wrapped in Rust (like Deno's V8) and because that gets built and delivered as a npmjs package, grammar authors do their thing in JavaScript.

But a TextMate compatible parser, like Syntect could probably be less of a hassle for the end user, just use those .json syntax files from VSCode/Sublime Text, modifying it would mean just editing some .json.

As it stands today, if you dislike a tree-sitter highlight, to change it is required writing a subset of scheme and/or tweaking tree-sitter using bindings in a language supported by your editor.
Even then, you'd also need to get knees deep into the third-party provided tree-sitter grammar. Só although easy to make them, editing them isn't as easy as extending a .json syntax file, like Sublime/TextMate grammars.

On editor ecosystem, having a TextMate grammar means much less work to port extensions from TextMate, VSCode and Sublime Text to Vim then with tree-sitter.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1208021400@github.com>

Christian Clason

unread,
Aug 8, 2022, 7:54:02 AMAug 8
to vim/vim, Subscribed

On editor ecosystem, having a TextMate grammar means much less work to port extensions from TextMate, VSCode and Sublime Text to Vim then with tree-sitter.

It's no skin off my nose either way, but just for the sake of completeness: going with tree-sitter would mean even less work porting from Neovim -- a "sister editor" that explicitly strives for Vim compatibility?


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1208028886@github.com>

Uriel Acioli

unread,
Aug 8, 2022, 8:21:15 AMAug 8
to vim/vim, Subscribed

It's no skin off my nose either way, but just for the sake of completeness: going with tree-sitter would surely mean even less work porting from Neovim -- a "sister editor" that explicitly strives for Vim compatibility?

@clason, going with tree-sitter means, for now, choosing Neovim compatibility over Sublime Text, VSCode and TextMate.
Following conventions means more features/innovations from other tools that follow those same conventions can be introduced into Vim with less work. Like Language Server Protocols conventions.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1208055682@github.com>

Linwei

unread,
Aug 9, 2022, 12:29:22 AMAug 9
to vim/vim, Subscribed

@clason , I admit that I am not aware of the parser generation part of tree-sitter, it is indeed my mistake to state it was written in rust.

A mistake is a mistake, I will not edit and revert my post.

But my core point still stands:

  1. It is harder to customize parsers, even if it is generated from a JS specification of the grammar .
  2. Anyone really care about semantic based syntax highlighter, can use LSP servers like coc or ycm, they have already provided a clangd based highlighter which is more robust and precise.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1208898714@github.com>

errael

unread,
Aug 9, 2022, 12:10:23 PMAug 9
to vim/vim, Subscribed

Maybe the first thing to do is a syntax highlighting interface, SHI, for vim. It could be set up such that if something adheres to the interface, it could be compiled with vim, added as a shared library, there can be an LSP adapter for SHI. The interface could support async/concurrent operation.

It's been mentioned that there are additional uses for a true language parser, such as folding info. Is it reasonable or useful to have multiple SHI active at one time? Internally vim could synthesize/merge the results from multiple sources.

Considering the heated interest in this topic, maybe Syntax/Highlighting Interface Tools, is better or more accurate.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1209586519@github.com>

icedman

unread,
Aug 17, 2022, 10:57:45 PMAug 17
to vim/vim, Subscribed

I went ahead and made a Textmate plugin. It is currently for nvim though.

https://github.com/icedman/nvim-textmate

Coded in c/c++, lua, uses a modified version Macromate's opensourced textmate app

No where ready but the speed already looks promising.

The syntax highlight output is similar to Treesitter. Treesitter has some other cool features. But it crawls when editing or even just opening large files. Example: Amalagamated sqlite3.c source 200k lines.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1218974642@github.com>

icedman

unread,
Aug 31, 2022, 9:35:52 AMAug 31
to vim/vim, Subscribed

textmate-based syntax highlighting for vim
https://github.com/icedman/vim-textmate


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1232947947@github.com>

Stephan Seitz

unread,
Aug 31, 2022, 7:16:14 PMAug 31
to vim/vim, Subscribed

Does it mean that we can do everything through a .scm file without changing the parser ?

You can select every part of the parsing result and define relations between different CST nodes (e.g. select a function that has three arguments with the third starting with a vowel). You are a bit limited that you can only select nodes of the syntax tree, not individual characters directly (without custom functions).
Custom functions can be interpreted by the editor. This can be used to select also subranges of a node or use custom logic to filter out results. That's usual enough for syntax highlighting. In neovim's implementation, you can register the mentioned custom functions via Lua.

what if language standard evolves ? still no need to change the parser ??

Yes, changes of the language requires to update, generate and compile the parsers. Like textmate grammars, the parser definitions are shared between editors. With the tree-sitter integration into Neovim the community got quite active, so typically new features get added quite quickly. In the case of nvim-treesitter, each plugin revision contains lockfile with parser revision we have tested on CI to be compatible with the highlight queries (when new language features should get highlighted, they must be referenced in the *.SCM files unless the parser editor chosen to reuse already present structures). The parser get updated and compiled at the end users side as soon as the feature went through our CI and got committed (rolling release). Other distribution strategies include to manage the parser via a plugin manager or via binary releases (parser pack, or via the regular release of Neovim that includes the parser for the C language with more to be added).

The parsers usually use terminology of the language specification and can re-use BNF-languguage specs if available. So there is mostly no need for customization as customization can be done via SCM files and the parser just follows official specs or existing parsers for the language. New parsers might have frequent updates in beginning until they cover all features of a language but at some point they are usually complete and only have few commits in a year. Like with syntax files it is definitely not necessary to be always on the latest revision.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1233540381@github.com>

icedman

unread,
Sep 1, 2022, 11:04:33 AMSep 1
to vim/vim, Subscribed

Try editing this in neovim with treesitter on:
https://code.jquery.com/jquery-3.6.1.js
10,000+ lines of code

Try even scrolling through sqlite3.c in neovim with treesitter
200,000+ lines of code

Plain vim has no problem with these files. Granted, it would be rare editing very large files. But when something vim could do previously well is no longer possible - it should be considered a regression.

The title of the proposal is simply a better syntax highlighting.

Treesitter should be another proposal or something for the future - perhaps when vim runs on multithreads.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234407652@github.com>

Christian Clason

unread,
Sep 1, 2022, 12:22:46 PMSep 1
to vim/vim, Subscribed

Plain vim has no problem with these files.

Yes, because Vim has a parsing timeout and a limited parsing window, which tree-sitter in Neovim does not (yet). It's important to compare apples with oranges here. Unqualified claims like

And textmate is the best answer.

do not help; at the very least I would have expected a benchmark here comparing (fairly!) the timings between regex highlighting, nvim-treesitter, and your textmate plugin for these files.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234506037@github.com>

Bram Moolenaar

unread,
Sep 1, 2022, 12:24:29 PMSep 1
to vim/vim, Subscribed


> Try editing this in neovim with treesitter on:
> https://code.jquery.com/jquery-3.6.1.js
> 10,000+ lines of code
>
> Try even scrolling through sqlite3.c in neovim with treesitter
> 200,000+ lines of code
>
> Plain vim has no problem with these files. Granted, it would be rare
> editing very large files. But when something vim could do previously
> well is no longer possible - it should be considered a regression.

It's not that rare. I worked on a compiler that produced C code, one
big file for the whole program. Others have mentioned generated XML.

If we are going to introduce a new way of syntax highlighting, it must
be able to handle this. Even when that is going to be difficult.

This means the parser must be able to start at some point in the file.
It may look back for a point to synchronize, but always starting at the
top of the file isn't going to be sufficient. It may very well mean
this is a different "mode" where some information is missing. While for
regular sized files everything is available.

Also keep in mind that it must be able to handle deleting and inserting
lines and still be fast. Also when that is a thousand lines. You don't
want to wait more than a second when moving code around.

Adding a new engine is going to be something that needs to be done
properly, it is a big investment.

--
It is too bad that the speed of light hasn't kept pace with the
changes in CPU speed and network bandwidth. -- ***@***.***>

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234508362@github.com>

errael

unread,
Sep 1, 2022, 12:58:21 PMSep 1
to vim/vim, Subscribed

Adding a new engine is going to be something that needs to be done properly, it is a big investment.

I hope that if/when the time comes for working on this, it is thought of as

Adding an engine interface allowing different implementations to be used

As discussed in #9087 (comment)


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234543393@github.com>

Linwei

unread,
Sep 1, 2022, 1:31:59 PMSep 1
to vim/vim, Subscribed

Some information:

I was reading vscode's latest documentation and found that:

At last, it seems like that vscode didn't choose to integrate tree-sitter directly, but provided
some APIs to allow extensions to provide new highlighting solutions:

Currently, vscode has two highlighting solutions:

Semantic highlighting is an addition to syntax highlighting as described in the Syntax Highlight guide. Visual Studio Code uses TextMate grammars as the main tokenization engine. TextMate grammars work on a single file as input and break it up based on lexical rules expressed in regular expressions.

Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project. Themes can opt in to use semantic tokens to improve and refine the syntax highlighting from grammars. The editor applies the highlighting from semantic tokens on top of the highlighting from grammars.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234578114@github.com>

bfrg

unread,
Sep 1, 2022, 2:12:09 PMSep 1
to vim/vim, Subscribed

The textmate parser everyone keeps referring to relies on the oniguruma regex library which contains approximately 80k lines of code. Is this even an option to integrate it into Vim? Users will have to learn a new regex flavor just for writing syntax files. On the other hand, if Vim uses its own regex engine, all the existing textmate syntax files won't work, or will they?

I would like to see a comparison between Vim's syntax highlighting and textmate for a more complicated filetype, like C++, bash or similar. The author keeps suggesting textmate but hasn't shown anything (at least a screenshot comparison). Where does textmate shine exactly? And what exactly is easier express in textmate's syntax files?


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234620623@github.com>

Bram Moolenaar

unread,
Sep 1, 2022, 2:55:20 PMSep 1
to vim/vim, Subscribed


> > Adding a new engine is going to be something that needs to be done
> > properly, it is a big investment.
>
> I hope that if/when the time comes for working on this, it is thought of as
> ```
> Adding an engine interface allowing different implementations to be used
> ```
> As discussed in https://github.com/vim/vim/issues/9087#issuecomment-1209586519

Providing an interface shifts the problem elsewhere, it doesn't solve
it. You still need a good endpoint for the interface, otherwise it's
useless. And unless that endpoint is one good solution, this will
result in multiple alternatives, making it more difficult for the user
who "just want it to work". And for someone who wants to support a
certain language (without spending too much time on it) creates extra
decisions to be made. It's a lot better to say "this is what you do"
instead of "you could do this, or that, or the other".

--
I have read and understood the above. X________________


/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/9087/1234660524@github.com>

Bram Moolenaar

unread,
Sep 1, 2022, 4:02:38 PMSep 1
to vim/vim, Subscribed


> The textmate parser everyone keeps referring to relies on the
> [oniguruma](https://github.com/kkos/oniguruma/) regex library which

> contains approximately 80k lines of code. Is this even an option to
> integrate it into Vim? Users will have to learn a new regex flavor
> just for writing syntax files. On the other hand, if Vim uses its own
> regex engine, all the existing textmate syntax files won't work, or
> will they?

Thank you very much for making this remark. I would say this is a deal
breaker. Not only because of using some non-standard regex syntax, also
because this library is, well, obscure? Have a look at one of the main
files: regcomp.c
(https://github.com/kkos/oniguruma/blob/master/src/regcomp.c)
There is no comment anywhere. No explanation of what the purpose of a
function is, what the arguments mean, nothing.
The file that looks like the main engine is riddled with C macros:
https://github.com/kkos/oniguruma/blob/master/src/regexec.c
And also no comments anywhere.

This makes it very difficult to maintain. Or perhaps I should say
"impossible'?

Also, what kind of regex engine is this? Backtracing, NFA, something
else? There is a FAQ - it has two entries, one of them says "there is
no mailing list".



> I would like to see a comparison between Vim's syntax highlighting and
> textmate for a more complicated filetype, like C++, bash or similar.
> The author keeps suggesting textmate but hasn't shown anything (at
> least a screenshot comparison). Where does textmate shine exactly? And
> what exactly is easier express in textmate's syntax files?

I think it's a given that the current Vim syntax engine is not ideal.
It was initiated long ago, completely depends on runtime pattern
recognition, etc. It has been optimized over the years, and it's amazing
that it is still an acceptable working solution.

It does function as the base to compare with. Any new syntax engine
needs to work a lot better than the existing one, otherwise it's not
worth switching over.

--
BEDEVERE: And what do you burn, apart from witches?
FOURTH VILLAGER: ... Wood?
BEDEVERE: So why do witches burn?
SECOND VILLAGER: (pianissimo) ... Because they're made of wood...?
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD


/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims --