[vim/vim] [proposal] Can we introduce TextMate grammar system for syntax highlighting ? (Issue #9087)

Linwei

unread,

Nov 4, 2021, 1:24:23 AM11/4/21

to vim/vim, Subscribed

Current problem

The current syntax highlighting system is very slow, and there are noticeable lags when scrolling large C++ files which contain complex syntax elements.

Issues of using a separated process

Previously, most people suggest something like nvim-treesitter which will analyze source code in a background treesitter process and render keywords in the foreground with text-property.

But is it a good idea? I don't really think so,
at least 4 disadvantages for treesitter solutions:

power consumption: an extra background job is required, causing less battery life and more carbon dioxide.
buffer sync: code must be written very carefully to guarantee that the source code in two processes (vim/treesitter) are the same, coc/nvim-treesitter sends the whole buffer to the background every time changetick increase to prevent such things, which is a little flaky.
reliability: an external program installed by the user is not reliable enough, they are plenty of errors in version compatibilities and environment errros, people need to take extra efforts to get syntax highlighting work.
poor parse quality: treesitter is a great project, but language parsers are contributed by people all over the world and their quality is not under control (performance issues or inaccurate results in certain languages).

Background syntax highlighter is still immature, there are still many other strange issues in nvim-treesitter:

https://github.com/nvim-treesitter/nvim-treesitter/issues

If we introduce something like this, we shall take all these issues into account.

TextMate grammar system

Syntax highlighting is the most important part of an editor, better not rely on any uncontrollable external programs.

We need some new things that can satisfy such goals below:

good performance
robust and reliable
accuracy
low power consumption
work in the same process (not require external programs)

And TextMate's grammar engine is really a good candidate which is widely used in many IDE/editors, including vscode (see syntax-highlight-guide for details), sublime and many others.

VS Code uses TextMate grammars as the syntax tokenization engine. Invented for the TextMate editor, they have been adopted by many other editors and IDEs due to the large number of language bundles created and maintained by the Open Source community.

TextMate grammars rely on Oniguruma regular expressions and are typically written as a plist or JSON. You can find a good introduction to TextMate grammars here, and you can take a look at existing TextMate grammars to learn more about how they work.

The grammar can be defined in JSON, that means can be translated into viml or just plain JSON files.

Possible Solution

We can specify which grammar engine to use for the given buffer:

default engine: current vim's regex grammar
textmate engine: textmate grammar system.

And some new command can be used to change grammar engine:

:syntax grammar textmate
:syntax grammar default
:syntax load ~/.vim/syntax/cpp.json

for example, the snippet below can be included in the head of syntax files:

if has('textmate')
    syntax grammar textmate
    syntax load syntax/cpp.json
    finish
endif

....

And lots of existing vscode/textmate syntax files can be reused with minimal modification.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

Bram Moolenaar

unread,

Nov 4, 2021, 7:18:42 AM11/4/21

to vim/vim, Subscribed

Thank you for starting this discussion. I had a vague plan to look into integrating treesitter, it is good to know it also has disadvantages. Vscode is widely used, thus if it uses TextMate then there must be something good about it.

Comments welcome.

Christian Clason

unread,

Nov 4, 2021, 8:59:40 AM11/4/21

to vim/vim, Subscribed

I'll just comment that I would take these comments about tree-sitter with a significant heap of salt.

Björn Linse

unread,

Nov 4, 2021, 9:19:33 AM11/4/21

to vim/vim, Subscribed

The might be some misunderstanding here. Tree-sitter in neovim doesn't use an external process like coc.nvim. The parser runtime is a C library embedded into the editor itself (in total not more LOC:s than syntax.c + highlight.c in vim itself), and parses the buffer in memory and produces a syntax tree that in-process plugins can use (for highlighting but also for other purposing like text objects).

Imran H.

unread,

Nov 4, 2021, 1:09:40 PM11/4/21

to vim/vim, Subscribed

Right now the biggest problem with syntax highlighting is how inconsistent and unpredictable it is. An unified interface will be more than worth the effort.

TextMate will probably be better for keeping the syntax system more integrated & backwards compatible than using something like treesitter. Also the modular and overengineered plugin architecture of treesitter would be a huge departure from the way it is done right now, so we should be a little cautious about how much functionality to reimplement.

bfrg

unread,

Nov 5, 2021, 9:17:48 AM11/5/21

to vim/vim, Subscribed

@bfredl How much longer does it take to load a larger file like src/evalfunc.c in Neovim when tree-sitter is enabled, compared to the default syntax highlighting? I'm assuming that default syntax highlighting is disabled for filetypes where tree-sitter is supported.

Björn Linse

unread,

Nov 5, 2021, 10:00:02 AM11/5/21

to vim/vim, Subscribed

@bfrg src/evalfunc.c from vim (10 000 lines) takes 80 ms more time with tree-sitter enabled for the initial parse (200ms compared to 120ms in my config)

mg979

unread,

Nov 6, 2021, 8:45:55 AM11/6/21

to vim/vim, Subscribed

treesitter is more than just syntax highlighting, it's also useful for text objects for example.

TextMate system is old, Sublime Text has been mentioned but it left it years ago to use its own syntax engine. Does it make sense to adopt a system that is already waning? And how big is its library if it must be included?

Also when saying that a system is more performant, some source/benchmark should be provided. Is it TextMate more performant than treesitter? Who says so?

Linwei

unread,

Nov 6, 2021, 1:10:27 PM11/6/21

to vim/vim, Subscribed

@bfredl , thanks for figuring it out, and I made a new revision:

list of tree-sitter disadvantages:

power consumption: yes tree-sitter is powerful, it will generate AST in real-time, but I am just talking about syntax highlighting, not textobj or indentation. AST generation has its price (see nvim-treesitter/nvim-treesitter#1292).
reliability: nvim-treesitter need to load an external shared library as the parser for each language, the shared library must be downloaded and compiled into .so files (I know :TSInstall can simplify these steps), building progress can break if gcc/clang is not installed, the plugin or neovim itself may break due to any common dynamic link library problems, eg: version incompatible when the plugin has updated but parser .so files not, dependency conflict when loading the shared library.
poor parser quality: tree-sitter is a great project, but language parsers are contributed by people all over the world and their quality is not under control (performance issues or inconsistent behavior in different languages).

The biggest risk is parser quality, over 100+ open issues for parsers:

https://github.com/nvim-treesitter/nvim-treesitter/issues?q=is%3Aissue+is%3Aopen+parser

examples for inconsistency:

example for performance:

The parser quality problem is totally out of control, nearly impossible for us to fix all the parsers one by one.

Linwei

unread,

Nov 6, 2021, 2:02:07 PM11/6/21

to vim/vim, Subscribed

@mg979 the core part of textmate syntax system is oniguruma, which is open source and well maintained by the community.

known editors / ides using textmate syntax system:

vscode
textmate itself
eclipse
jetbrains

Monarch was initially built to support languages in VS Code. Then, they decided to switch for TextMate as well because of reasons listed here: microsoft/vscode#174 (comment) .

Some details:

VS Code's tokenization engine is powered by TextMate grammars. TextMate grammars are a structured collection of regular expressions and are written as a plist (XML) or JSON files. VS Code extensions can contribute grammars through the grammar contribution point.

The TextMate tokenization engine runs in the same process as the renderer and tokens are updated as the user types. Tokens are used for syntax highlighting, but also to classify the source code into areas of comments, strings, regex.

Starting with release 1.43, VS Code also allows extensions to provide tokenization through a Semantic Token Provider. Semantic providers are typically implemented by language servers that have a deeper understanding of the source file and can resolve symbols in the context of the project. For example, a constant variable name can be rendered using constant highlighting throughout the project, not just at the place of its declaration.

Highlighting based on semantic tokens is considered an addition to the TextMate-based syntax highlighting. Semantic highlighting goes on top of the syntax highlighting. And as language servers can take a while to load and analyze a project, semantic token highlighting may appear after a short delay.

it is easy to implement textmate syntax highlighting

The tokenizer of vscode/textmate is:

https://github.com/kkos/oniguruma

And here is the wrapper in javascript, it's neatly written and not hard to understand:

https://github.com/microsoft/vscode-textmate

All we need to do is rewriting the javascript wrapper in C,

And thousands of textmate syntax files are ready to use.

lacygoill

unread,

Nov 6, 2021, 3:04:02 PM11/6/21

to vim/vim, Subscribed

No more than 4854 lines (including comments) in javascript/typescript

Tests excluded, it's 3779 lines of code (source: cloc(1)).
Tests included, it's 5074 lines of code.

lacygoill

unread,

Nov 6, 2021, 3:05:46 PM11/6/21

to vim/vim, Subscribed

Why not Sublime grammar instead of TextMate grammar? It seems more powerful, and easier to read.

I think .sublime-syntax is more easy to write and readable.

source

Sublime text 3 has implemented a new grammar format that seems much better than the traditional textmate grammar.

source

Is it because there have been fewer .sublime-syntax files written than .tmLanguage ones? Is there a licensing issue with these files?

Linwei

unread,

Nov 6, 2021, 3:12:13 PM11/6/21

to vim/vim, Subscribed

@lacygoill maybe textmate grammar is a little easier ? because there are reference implementations:

But sublime is closed source ? we need write it from scratch ??

lacygoill

unread,

Nov 6, 2021, 3:31:20 PM11/6/21

to vim/vim, Subscribed

But sublime is closed source ? we need write everything from scratch ??

Good point. I forgot that sublime was closed source.

Is TextMate much better (readibility, reliability, performance) than our current syntax highlighting mechanism?

Just for TypeScript alone, there have been 754 reported bugs, 41 remaining open currently.

Assuming we support TextMate, what would happen to our current issues related to syntax highlighting? Do we close them, and tell their authors to use the new syntax highlighting mechanism? If the users find issues in TextMate grammar files, do we accept their reports on this bug tracker? IOW, is it going to help reduce the number of remaining open issues here?

Linwei

unread,

Nov 6, 2021, 3:48:25 PM11/6/21

to vim/vim, Subscribed

Because TypeScript is a new language that evolve quickly ?

Oniguruma + json like config is certainly faster enough than current vim's mechanism. People seldom encounter performance issues in syntax highlighting when using textmate/vscode/eclipse/jetbrains.

Sublime's grammar seems more readable and powerful than textmate, maybe oniguruma+config can achieve such thing.

lacygoill

unread,

Nov 6, 2021, 6:32:17 PM11/6/21

to vim/vim, Subscribed

I remember an issue where Vim was very slow when adding/removing text properties on CursorMoved. It only occurred while the syntax highlighting was enabled. So, one might think that the latter was the culprit. It turns out that the syntax highlighting was fine; the issue was Vim redrawing the screen too much.

With regards to how people perceive the current syntax highlighting as being too slow, I wonder which part of the issue comes from the syntax highlighting itself, and which part from something else like (too much redraw).

People seldom encounter such issues in syntax highlighting when using textmate/sublime2/vscode/eclipse/jetbrains.

That's interesting. I hope it's really thanks to their own syntax highlighting mechanism, and not some other optimizations (like multithreading).

mg979

unread,

Nov 7, 2021, 6:43:14 AM11/7/21

to vim/vim, Subscribed

A couple of remarks:

with vim system it's really easy to add custom groups to extend current syntax in after/syntax, would it be possible to do that with TextMate as well?
as @lacygoill said, there could be other bottlenecks (too much redrawing), that would limit TextMate performance in the same way, isn't it better to investigate those first?
programs with GUI use multithreading and this surely helps them
sometimes a slow syntax highlighting depends on how (bad) the syntax script is written (for example default vimscript syntax is obscenely slow), and it would be faster with some changes in the script

I think performance of vim syntax highlighting could be improved before trying alternatives, for example:

there are known problems with folding, it would help to fix those
how much of the syntax is recalculated in insert mode? I think only the part of text from the insertion point up to the last visible line in the window should be recalculated, is this the case or does vim do a full update on every keystroke?

Stephan Seitz

unread,

Nov 8, 2021, 7:12:29 PM11/8/21

to vim/vim, Subscribed

I want to add that we currently have no safe-guards for tree-sitter that are applied for regex-based highlighting like limiting the line number or doing background parsing like Atom would do.

Background syntax highlighter is still immature

I think background syntax highlighting (if you refer to asynchronous or separate threads highlighting) is neither implemented for tree-sitter nor for traditional vim highlighting. The possibility to make a fast thread-safe copy of the parsing state for tree-sitter or any other kind of multithreading is not used at the moment in Neovim.

Many of the issues you cited complained about features missing due to missing :h syntax. It will always be difficult to transition from one syntax system to another especially when it is so widely supported like vim syntax/fold/indent files. Maybe it would be easier to maintain more compatibility with a system that works more similar.

About quality of the grammars, you surely have different trade-offs. VS-Code has significant more users than Atom and Nightly-Neovim. Tree-sitter parses the whole document which can help with complex syntax constructs and large-scale structure. However, it will easier get confused when it sees something that cannot be handled be the language grammar (preproc-constructs or non-standard language extensions) while regexes with a more local view are often still ok. The error recovering capabilities vary a lot on how the concrete grammar is written. Tree-sitter provides something in-between regex highlighting and LSP-like semantic highlighting, so it might not be necessary if the two latter are available for a language. Distributing binary is another challenge for tree-sitter. Arbitrary code execution through custom scanners enables highest flexibility but may also pose a security risk though if the parsers are not self-generated and the scanner code is not reviewed.

Andrey Mishchenko

unread,

Nov 9, 2021, 2:55:47 PM11/9/21

to vim/vim, Subscribed

For those who haven't seen it, this is an excellent introduction to Tree-sitter, by the author: https://www.youtube.com/watch?v=Jes3bD6P0To&ab_channel=StrangeLoopConference

tl;dr: Tree-sitter is a (portable, dependency-free) C library which (conceptually) takes a grammar (expressed in JavaScript) and a source file, and returns a parse tree for the source file with respect to the grammar. The big selling point is that TS (claims that it) can handle syntax errors well (still return a reasonable parse tree) and that it is incremental (returns new parse trees efficiently/quickly given some code edits and previous trees).

Parsers for different languages are provided by the community and while I haven't seen this first-hand, I find it easy to believe that many of them are not great. But the project is much younger than TextMate, and GitHub uses it for its on-web syntax highlighting so there might be some corporate support there.

Personally, the thing I would be most excited about seeing is Vim exposing a representation of the syntax tree which can be used not just for syntax coloring but also for semantic editing (expand visual selection one AST node up, copy function body, etc.). IDK how well the Vim architecture supports this today. But in theory you could then plug in whatever parse-tree-generator you choose (Tree-sitter or TextMate).

If you are using an LSP language server, it's true that the LS can give you a parse tree (on which is even more accurate, esp. in the case of context-sensitive grammars like C++), but language server (which Vim also doesn't natively support yet) will always be slower (it will do more than a parser, for example it will resolve cross-file deps and so on) and therefore will have to be async and higher-latency. So I think there is room for both a fast incremental parse system (like Tree-sitter) and LSP support (for things like go-to-definition and find usage).

See also this discussion in the VSCode repo: microsoft/vscode#50140

fcurts

unread,

Nov 19, 2021, 2:14:19 PM11/19/21

to vim/vim, Subscribed

As someone who has spent months writing and maintaining TextMate and tree-sitter grammars for real-world languages, let me tell you that the TextMate grammar system is totally broken, at least from a 2021 perspective. TextMate grammars are a nightmare to maintain and impossible to get right. Out of desperation, I even developed my own macro system (just like the authors of TypeScript's TextMate grammar), and it was still a nightmare.

tree-sitter is in a completely different league. It's a top-notch incremental parser that can be used for accurate (!) syntax highlighting, code folding, code formatting, etc. tree-sitter grammars are dramatically easier to write and maintain, and it's actually possible to get them right. GitHub has been using tree-sitter for a while, and VSCode is also starting to use it (see https://github.com/microsoft/vscode-anycode).

Betting on TextMate grammars in 2021 would be an engineering crime.

Imran H.

unread,

Nov 20, 2021, 7:19:14 AM11/20/21

to vim/vim, Subscribed

I am not sure how much of your hyperbolic speech can be deemed accurate, but from what I can see one of the biggest problem with tree-sitter is the general low quality of parsers contributed by different people as pointed out by the OP. "Top-notch" is not the way I would describe it. Which certainly needs to be taken into account as it would require a vast amount of effort to deal with these issues Vim would inherit as a result of undertaking the HUGE project of integrating tree-sitter.

I can't speak for textmate grammar for lack of familiarity. Personally my biggest problem with tree-sitter (at least the way neovim does it) is it's dependency on the environment (gcc/clang), large binary size and the do-it-all mentality which suits neovim but definitely does not feel like the "vim way".

Bram Moolenaar

unread,

Nov 20, 2021, 7:23:34 AM11/20/21

to vim/vim, Subscribed

> As someone who has spent months writing and maintaining TextMate and
> tree-sitter grammars for real-world languages, let me tell you that
> the TextMate grammar system is totally broken, at least from a 2021
> perspective. TextMate grammars are a nightmare to maintain and

> _impossible_ to get right. Out of desperation, I even developed my own

> macro system (just like the authors of TypeScript's TextMate grammar),
> and it was still a nightmare.
>
> tree-sitter is in a completely different league. It's a top-notch
> incremental parser that can be used for accurate (!) syntax
> highlighting, code folding, code formatting, etc. tree-sitter grammars
> are dramatically easier to write and maintain, and it's actually
> possible to get them right. GitHub has been using tree-sitter for a
> while, and VSCode is also starting to use it (see
> https://github.com/microsoft/vscode-anycode).
>
> Betting on TextMate grammars in 2021 would be an engineering crime.

Thanks for your opinion. Making it easier/simpler/better to write a
parser is an important goal. So we should look at the best way to use
tree-sitter. That it compiles each parser into an executable seems like
a disadvantage. Perhaps this is OK for often used languages, but a way
to add a parser at runtime would be really useful.

--
TIM: Too late.
ARTHUR: What?
TIM: There he is!
[They all turn, and see a large white RABBIT lollop a few yards out of the
cave. Accompanied by terrifying chord and jarring metallic monster noise.]
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

jgb

unread,

Nov 20, 2021, 8:08:34 AM11/20/21

to vim/vim, Subscribed

tree-sitter is in a completely different league. It's a top-notch incremental parser that can be used for accurate (!) syntax highlighting, code folding, code formatting, etc. tree-sitter grammars are dramatically easier to write and maintain, and it's actually possible to get them right. GitHub has been using tree-sitter for a while, and VSCode is also starting to use it (see https://github.com/microsoft/vscode-anycode).

If tree-sitter is top-notch, how come an ubiquitous and highly popular language like python is broken in it since quite a while?
When I tested neovim 0.5.1 with tree-sitter I ended up having to disable TS for python (which is the language I use the most) because the indenting and highlighting were unusable. Doesn't exactly inspire confidence.

Christian Clason

unread,

Nov 20, 2021, 8:23:12 AM11/20/21

to vim/vim, Subscribed

I think this discussion is devolving more and more from the purely technical and into prejudices. It is very important here to distinguish

tree-sitter (the engine, which I would agree with @fcurts is an excellent piece of software and fundamentally superior to other syntax engines);
Neovim's integration of tree-sitter, which is still marked "experimental" for a reason (and should be further separated into the fundamental integration and API in core -- which works rather well already -- and its use for syntax highlighting, folding, indentation etc. -- which is very much work in progress);
The individual language parsers (and queries), which are externally maintained.

I think Vim should at this stage focus on 1. to make a reasoned decision (while it of course makes good sense -- and would make me very happy -- to take Neovim's approach and decisions for 2. into account; admitting that the two projects have different needs).

And I find it highly disingenuous to point fingers at 3. while ignoring that the quality of TexMate grammars (and, indeed, Vim's bundled syntax files) varies wildly as well. It's clear that (just like Neovim) you cannot simply switch engines and have to support both (on a per-language basis) for some time until the replacement catches up.

fcurts

unread,

Nov 20, 2021, 9:44:26 AM11/20/21

to vim/vim, Subscribed

I was obviously talking about the engine, which is what matters in the long run. Regarding existing grammars, the difference is that tree-sitter grammars can be improved relatively easily because they can be reasoned about. On the other hand, improving real-world TextMate grammars is anywhere from difficult to impossible. (Often, fixing one problem causes an inexplicable problem somewhere else, which is only discovered later.)

I can't comment on integration aspects. I'm not even a Vim user. But as a language/tooling developer myself, I feel strongly that it's time to move past TextMate grammars, which is why I offered my insights. Good luck!

Stephan Seitz

unread,

Nov 20, 2021, 10:32:58 AM11/20/21

to vim/vim, Subscribed

If tree-sitter is top-notch, how come an ubiquitous and highly popular language like python is broken in it since quite a while?
When I tested neovim 0.5.1 with tree-sitter I ended up having to disable TS for python (which is the language I use the most) because the indenting and highlighting were unusable. Doesn't exactly inspire confidence.

@jgb Indentation has nothing to do with tree-sitter itself. There is a very ad-hoc implementation of using the parsed tree as indentexpr. Python indentation is not working because this implementation just considers the current syntax node you are currently on which is nothing in case of the Python parser because the relevant syntax node ended in the previous line when you start a new one. One would have to add a rule that respects this case or tune the general logic at this point.

You always have to write some system that translates your parsed representation to indents. The quality of this translation says nothing about the quality of the representation itself.

Isopod

unread,

Dec 23, 2021, 10:14:18 AM12/23/21

to vim/vim, Subscribed

As someone who recently spent some time writing a TreeSitter grammar, I have also become less enthusiastic of the project. I watched the author’s presentation a while ago and it sounded like the greatest invention since sliced bread, but in practice it doesn’t always work that well.

The biggest obstacle in my opinion is languages with preprocessors (e.g. C and C++). This isn’t something I had considered initially, but it is simply impossible to parse those languages with TreeSitter because you’re dealing with a language within a language. Now before someone mentions this: I know TreeSitter supports injections, e.g. JavaScript in HTML, but that’s not the same thing because, as I understand, each injection is essentially its own “program”. It’s fundamentally not possible to parse pre-processed languages with a context-free grammar. If you think about it, conditional compilation is as context-sensitive as it gets.

I’m talking about constructs like this:

#if FLAG

  if (foo) {

#endif



  bar;



#if FLAG

  }

#endif

Or this:

#define BEGIN_FUNC void () {

#define END_FUNC }

BEGIN_FUNC

  bla;

END_FUNC

Or this:

#define RENAME(x) renamed_ ## x

void RENAME(my_func) {

  bla;

}

How is TreeSitter supposed to generate an AST for such code if it doesn’t interpret the macros? It’s simply impossible. And often this will result in parse errors. Now, TreeSitter is in theory “fault tolerant”, so it should be able to recover from errors, but I’ve found that it often recovers in a weird, unpredictable way that causes syntax highlighting to be messed up. It gets even worse when we’re talking about using it for features like syntax-aware selections, indentations and folds: Just forget about it.

All TreeSitter grammars for preprocessed languages contain hacks to work around this issue, but they never work 100%. They just handle a few special cases, but blow up in the general case.

The next problem is that parsing is incredibly slow. I benchmarked parsing a 4 MB file and it took over a second. Depending on where you are coming from, that might not sound too bad, but 4 MB a second really isn’t impressive when you consider that modern RAM can handle tens of gigabytes per second. Quite frankly, I’m not sure this “incremental parsing” approach is all that useful when the implementation is so slow in practice. I guarantee I could write a hand-rolled parser that would just reparse the entire file on every edit and it would still be orders of magnitudes faster.

I’ve also found that syntactic highlighting doesn’t actually add that much value over a simple lexer, but it is significantly more complex. Semantic highlighting on the other hand is even more complex, but it also adds a lot of value. If I had to rate the cost-benefit relationship, I’d say: lexer > semantic > syntactic.

If I had to design a syntax highlighting system from scratch, I’d probably just go with a simple C API, something like this:

typedef enum {TOK_IDENT, TOK_STRING, TOK_OPERATOR, ...};

void highlight_tokens(const char *buf, size_t len, Token *tokens, const void *input_state, void *output_state, size_t state_size);

You just pass a chunk of data to the parser and then it returns a buffer with a character class for each character (or maybe an array of ranges, see also LSP for a similar approach). This is the most general form, giving you the greatest amount of flexibility. You could hand-roll a parser, or build one based on regexes or TreeSitter grammars or whatever. It doesn’t restrict you to a particular system.

I’d even consider getting rid of the state persistence stuff and just pass one large buffer containing the entire file and reparse the whole file every time. Because in the general case, you have to do it anyway. Consider putting a comment /* at the beginning of a very large file. No matter what you do, sometimes, you’ll have to reparse everything, so I’m not sure it is even worth adding complexity to save time for only some edits. Better work on making the parser really fast. Computers are fast, it shouldn’t take that long to parse even a 100 MB file. And source files are usually much smaller than this.

—
Reply to this email directly, view it on GitHub, or unsubscribe.

Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.

Linwei

unread,

Dec 23, 2021, 2:10:00 PM12/23/21

to vim/vim, Subscribed

Anyone who eagerly promotes tree-sitter here should answer my questions above first. Repeating its advantages a thousand times do not mean that these fatal problems will disappear.

Tree-sitter is not a new thing, no need to be so excited. Remember that Atom has adopted tree-sitter early in 2018, users in the atom communities are very calm about this "new" feature.

I don't need a better highlighter at the cost of perfomance and flexibility. Because I am suffering performance issues right now and all I want is a fast & static regex-based highlighting.

@lacygoill you claimed in this comment that the problem was caused by "drawing too much".

That's not true, I have done a bisect investigation in this problem here:

#2712

And found that there was a big performance regression after 8.0.643 and 8.0.647. You can simply compare syntax highlighting speed difference in both vim 7.4 and the latest vim 8.3.xxxx and you will find that this is by no means a simple "drawing too much" problem.

—
Reply to this email directly, view it on GitHub.

Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.

lacygoill

unread,

Dec 23, 2021, 4:31:30 PM12/23/21

to vim/vim, Subscribed

@lacygoill you claimed in this comment that the problem was caused by "drawing too much".
That's not true,

It is. The patch that fixed my issue only reduced how often Vim was redrawing the screen:

vim9script
diff --git a/src/textprop.c b/src/textprop.c
index b6cae70a8..e74c13849 100644
--- a/src/textprop.c
+++ b/src/textprop.c
@@ -809,6 +809,7 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
     int                id = -1;
     int                type_id = -1;
     int                both;
+    int                is_removed = FALSE;

     rettv->vval.v_number = 0;
     if (argvars[0].v_type != VAR_DICT || argvars[0].vval.v_dict == NULL)
@@ -889,6 +890,7 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
                if (both ? textprop.tp_id == id && textprop.tp_type == type_id
                         : textprop.tp_id == id || textprop.tp_type == type_id)
                {
+                   is_removed = TRUE;
                    if (!(buf->b_ml.ml_flags & ML_LINE_DIRTY))
                    {
                        char_u *newptr = alloc(buf->b_ml.ml_line_len);
@@ -920,7 +922,8 @@ f_prop_remove(typval_T *argvars, typval_T *rettv)
            }
        }
     }
-    redraw_buf_later(buf, NOT_VALID);
+    if (is_removed)
+       redraw_buf_later(buf, NOT_VALID);
 }

As anyone can see, the patch did one thing, and one thing only: it put a condition on redraw_buf_later(); the latter can only be invoked if is_removed is true:

if (is_removed)

It did nothing else. And yet, it was enough to fix the issue.

I have done a bisect investigation in this problem here:

Syntax highlighting is extremely slow when scrolling up in recent version (v8.0.1599) #2712

This has nothing to do with my comment. It's an entirely different issue. The only way your comment might be relevant would be if I had written:

whenever Vim is slow, it's because it redraws the screen too much

But I did not say that. And the comment you link did not say that either.

I wrote that in my issue, the cause was too much redraw.
I did not write that in all issues, the cause was too much redraw.

Two last notes before I unsubscribe from this thread.

Asking for questions or clarifications is OK, but saying that I lie is not. I don't want to read anything from you anymore, so I've blocked you.
I don't care whether Vim integrates tree-sitter, TextMate, or whatever software is trending right now. All I care is how reliable Vim is.

—
Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.

Linwei

unread,

Dec 24, 2021, 4:23:34 AM12/24/21

to vim/vim, Subscribed

@lacygoill, Sad to hear that. I have been following you on Github for years, reading your posts in the issues, and studying your early vim9 plugin projects. What I mean was nothing more than "your speculation may be wrong". Complaining that I complained you "lied" was a little overreacting.

You just blocked a faithful follower.

—
Reply to this email directly, view it on GitHub.
Triage notifications on the go with GitHub Mobile for iOS or Android.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Mar 22, 2022, 2:11:10 AM3/22/22

to vim/vim, Subscribed

I made the textmate parser portable. Removed the osx foundation codes. It may be worth a test as a vim plugin. Making one is beyond my skillset.

https://github.com/icedman/tm-parser

This library works well on my editor projects, including an ncurses based editor. Works well enough with my Flutter app als. Ashlar Code app for Android (munchyapps.com)

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Martin Tournoij

unread,

Jun 3, 2022, 12:03:51 PM6/3/22

to vim/vim, Subscribed

Last year I wrote a plugin to highlight things with LPeg; I choose LPeg because I liked the way it works, and vis already uses it and there a reasonable number of syntax files already available.

I got bored with it and never finished/published it; I think there were still some remaining issues, but I forgot what they are/were. I think most were related to using text properties to apply highlights, rather than LPeg itself, but not sure. Maybe I'll work on it some more and get it to at least a "publishable/experimental" state.

I also spent quite some time looking at tree-sitter; actually, that was what I originally wrote the plugin for, and came to the conclusion I don't care much for tree-sitter, or at least not for editors. One of the really great features of Vim's current syntax highlighting is that it's pretty easy to modify by users. Based on my experience answering questions on the Vi Stack Exchange people want to do this all the time: they want to highlight some keywords as errors; don't like how this or that is highlighted and want something different, they want to highlight their own project-specific things, etc. Tree-sitter makes that much harder, and I'd consider it a huge UX regression.

Even in "normal" usage there's an entire circus around managing it for end-users; you can't just "drop a file in ~/.vim/syntax/mylanguage.vim" or "~/.vim/after/syntax/mylanguage.vim", you need to compile shared objects with NodeJS and whatnot. The nvim-treesitter plugin manages all of that for you, but a plugin to manage all the circus is putting lipstick on an ugly pig IMO.

I also don't like the way tree-sitter syntax files are written in the first place; other people mentioned that many tree-sitter highlights aren't all that great, and that matches my experience too. My first instinct was "okay, so let's improve this!" but I found that quite hard and gave up after mucking about for a while with very limited success. I think that syntax being hard to write in tree-sitter is probably the reason so many syntaxes aren't so great in the first place. I certainly don't see how tree-sitter is "fundamentally superior to other syntax engines" as someone mentioned in this thread; this seems like some true-ism that keeps getting repeated, but I've seen any reasons why this should be the case (and I did try to find reasons).

Overall I do think the "tree-sitter approach" of more structured parsing is the better approach, I just don't think that tree-sitter is an especially great fit for Vim. I don't know why Neovim went with tree-sitter specifically: as near as I can determine it's just because someone wrote a patch for that – I couldn't really find any discussions about it. Interestingly Neovim does use LPeg internally for some things, I don't know if it was considered – or maybe it was, I very well may have missed some discussions somewhere.

I don't have any opinion on TextMate's system, as I didn't look at it, but when I started working on all of this and evaluating options I wrote down the follow requirements:

Reasonably fast, even for large files, and it doesn't break.

Reasonable easy to modify, including by "normal" users such as sysadmins, scientists (in fields other than comp-sci), and just regular hobbyists who are not professional developers.

Readability and maintenance is important. Right now syntax files are a bit of a "write only, hopefully never read"-affair.

Easy to manage, it should "just work" after dropping a new file in your ~/.vim/ without muckery.

There are a million-and-one parser generators, tools, and so forth out there. It's literally people's entire career to research these kind of things and write tools for them.

Many of then fit requirement 1 ("fast and correct"), but most of them are not especially user-friendly. EBNF (and variants thereof) are more or less the standard for describing languages, but do you really want this as the basis for your syntax highlighting? Probably not.

This is actually a great feature of the current syntax system: you can add, remove, and modify things fairly easy. "I don't like this highlight" or "I want to add a new highlight for X" should be something a fairly experienced dev can do in under an hour. LPeg mostly retains this feature: you can still say "yo dawg, highlight this for me, kthxbye" or "eww, I don't like this, get rid of it!" and be done with it.

Without detailing all the solutions I looked at, I eventually settled on LPeg because of all the solutions I found I felt it had the best combination of correctness and UX.

I still think these are good requirements. It's quite possible there are existing tools out there that do a better job than LPeg, but IMHO tree-sitter very much doesn't.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Martin Tournoij

unread,

Jun 3, 2022, 12:37:59 PM6/3/22

to vim/vim, Subscribed

I put my LPeg plugin over here: https://github.com/arp242/lpeg.vim

Like I said in my previous comment, I haven't worked on it for quite a while, but I did some spot-checking and seems to work fairly decently. Much of it is stolen^H^H^H^H^H^H inspired by vis.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Jun 3, 2022, 1:37:41 PM6/3/22

to vim/vim, Subscribed

If anyone wants to have a look on a native (=compiled without runtime dependencies). I would have a look into bat https://github.com/sharkdp/bat. They use a native implementation that reads texmate grammars called synctex https://crates.io/crates/syntect/1.7.1 which is probably good enough to try it out in vim before implementing a C implementation.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Linwei

unread,

Jul 8, 2022, 8:18:45 AM7/8/22

to vim/vim, Subscribed

@theHamsta , sublime grammar is also a good choice:

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Uriel Acioli

unread,

Aug 8, 2022, 2:08:16 AM8/8/22

to vim/vim, Subscribed

tree-sitter's highlights have a lot of quality issues, Syntect and therefore TextMate's grammar in Vim would be a game changer, as least for me. And since Rust has a very good FFI for C, I think it might be a feasible endeavor to integrate Syntect's lib with Vim's.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Linwei

unread,

Aug 8, 2022, 6:19:24 AM8/8/22

to vim/vim, Subscribed

In spite of tree-sitters poor parser quality, the fatal issue of the tree-sitters highlight is portability.

If we want to encourage people to create diverse syntax highlighting, we must provide something simple, straightforward, and easy to learn for most users.

When we are using text-based grammar files (vim syntax/TextMate/Sublime syntax), it is very easy to make modifications and create a new one. For example, I can change the cpp.vim to a new version to highlight some keywords/rules dedicated to my project or to meet the latest c++ standard if the original author is too busy to update.

While, the tree-sitter's syntax highlighting rule is hard-coded into the parsers, even if you want to make a small change. You are required to change the parsers yourself and build a new .so file for your target platform.

Changing a parser is much more complex than changing a text-based grammar file.

BTW: Tree-sitter is written in rust.

All of our grammar authors are capable of writing rust now?
Everyone has already installed a rust development environment on their computer?

So far as I know, many vim users still don't have a gcc environment to build Vim themself.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Andrey Mishchenko

unread,

Aug 8, 2022, 7:38:44 AM8/8/22

to vim/vim, Subscribed

In case @skywind3000 decides to edit his post: he is boldly claiming that (1) Tree-sitter is written in Rust, (2) you have to write Rust code to create TS grammars, and (3) you cannot change TS highlighting at runtime. These claims are all very false. Since he has shown that he is willing to completely make things up to support his point, anything he says should be taken with a grain of salt.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Uriel Acioli

unread,

Aug 8, 2022, 7:46:17 AM8/8/22

to vim/vim, Subscribed

@skywind3000 not all of them know Rust but since tree-sitter has a C core, wrapped in Rust (like Deno's V8) and because that gets built and delivered as a npmjs package, grammar authors do their thing in JavaScript.

But a TextMate compatible parser, like Syntect could probably be less of a hassle for the end user, just use those .json syntax files from VSCode/Sublime Text, modifying it would mean just editing some .json.

As it stands today, if you dislike a tree-sitter highlight, to change it is required writing a subset of scheme and/or tweaking tree-sitter using bindings in a language supported by your editor.
Even then, you'd also need to get knees deep into the third-party provided tree-sitter grammar. Só although easy to make them, editing them isn't as easy as extending a .json syntax file, like Sublime/TextMate grammars.

On editor ecosystem, having a TextMate grammar means much less work to port extensions from TextMate, VSCode and Sublime Text to Vim then with tree-sitter.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Christian Clason

unread,

Aug 8, 2022, 7:54:02 AM8/8/22

to vim/vim, Subscribed

On editor ecosystem, having a TextMate grammar means much less work to port extensions from TextMate, VSCode and Sublime Text to Vim then with tree-sitter.

It's no skin off my nose either way, but just for the sake of completeness: going with tree-sitter would mean even less work porting from Neovim -- a "sister editor" that explicitly strives for Vim compatibility?

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Uriel Acioli

unread,

Aug 8, 2022, 8:21:15 AM8/8/22

to vim/vim, Subscribed

It's no skin off my nose either way, but just for the sake of completeness: going with tree-sitter would surely mean even less work porting from Neovim -- a "sister editor" that explicitly strives for Vim compatibility?

@clason, going with tree-sitter means, for now, choosing Neovim compatibility over Sublime Text, VSCode and TextMate.
Following conventions means more features/innovations from other tools that follow those same conventions can be introduced into Vim with less work. Like Language Server Protocols conventions.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Linwei

unread,

Aug 9, 2022, 12:29:22 AM8/9/22

to vim/vim, Subscribed

@clason , I admit that I am not aware of the parser generation part of tree-sitter, it is indeed my mistake to state it was written in rust.

A mistake is a mistake, I will not edit and revert my post.

But my core point still stands:

It is harder to customize parsers, even if it is generated from a JS specification of the grammar .
Anyone really care about semantic based syntax highlighter, can use LSP servers like coc or ycm, they have already provided a clangd based highlighter which is more robust and precise.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

errael

unread,

Aug 9, 2022, 12:10:23 PM8/9/22

to vim/vim, Subscribed

Maybe the first thing to do is a syntax highlighting interface, SHI, for vim. It could be set up such that if something adheres to the interface, it could be compiled with vim, added as a shared library, there can be an LSP adapter for SHI. The interface could support async/concurrent operation.

It's been mentioned that there are additional uses for a true language parser, such as folding info. Is it reasonable or useful to have multiple SHI active at one time? Internally vim could synthesize/merge the results from multiple sources.

Considering the heated interest in this topic, maybe Syntax/Highlighting Interface Tools, is better or more accurate.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Aug 17, 2022, 10:57:45 PM8/17/22

to vim/vim, Subscribed

I went ahead and made a Textmate plugin. It is currently for nvim though.

https://github.com/icedman/nvim-textmate

Coded in c/c++, lua, uses a modified version Macromate's opensourced textmate app

No where ready but the speed already looks promising.

The syntax highlight output is similar to Treesitter. Treesitter has some other cool features. But it crawls when editing or even just opening large files. Example: Amalagamated sqlite3.c source 200k lines.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Aug 31, 2022, 9:35:52 AM8/31/22

to vim/vim, Subscribed

textmate-based syntax highlighting for vim
https://github.com/icedman/vim-textmate

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Aug 31, 2022, 7:16:14 PM8/31/22

to vim/vim, Subscribed

Does it mean that we can do everything through a .scm file without changing the parser ?

You can select every part of the parsing result and define relations between different CST nodes (e.g. select a function that has three arguments with the third starting with a vowel). You are a bit limited that you can only select nodes of the syntax tree, not individual characters directly (without custom functions).
Custom functions can be interpreted by the editor. This can be used to select also subranges of a node or use custom logic to filter out results. That's usual enough for syntax highlighting. In neovim's implementation, you can register the mentioned custom functions via Lua.

what if language standard evolves ? still no need to change the parser ??

Yes, changes of the language requires to update, generate and compile the parsers. Like textmate grammars, the parser definitions are shared between editors. With the tree-sitter integration into Neovim the community got quite active, so typically new features get added quite quickly. In the case of nvim-treesitter, each plugin revision contains lockfile with parser revision we have tested on CI to be compatible with the highlight queries (when new language features should get highlighted, they must be referenced in the *.SCM files unless the parser editor chosen to reuse already present structures). The parser get updated and compiled at the end users side as soon as the feature went through our CI and got committed (rolling release). Other distribution strategies include to manage the parser via a plugin manager or via binary releases (parser pack, or via the regular release of Neovim that includes the parser for the C language with more to be added).

The parsers usually use terminology of the language specification and can re-use BNF-languguage specs if available. So there is mostly no need for customization as customization can be done via SCM files and the parser just follows official specs or existing parsers for the language. New parsers might have frequent updates in beginning until they cover all features of a language but at some point they are usually complete and only have few commits in a year. Like with syntax files it is definitely not necessary to be always on the latest revision.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 1, 2022, 11:04:33 AM9/1/22

to vim/vim, Subscribed

Try editing this in neovim with treesitter on:
https://code.jquery.com/jquery-3.6.1.js
10,000+ lines of code

Try even scrolling through sqlite3.c in neovim with treesitter
200,000+ lines of code

Plain vim has no problem with these files. Granted, it would be rare editing very large files. But when something vim could do previously well is no longer possible - it should be considered a regression.

The title of the proposal is simply a better syntax highlighting.

Treesitter should be another proposal or something for the future - perhaps when vim runs on multithreads.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Christian Clason

unread,

Sep 1, 2022, 12:22:46 PM9/1/22

to vim/vim, Subscribed

Plain vim has no problem with these files.

Yes, because Vim has a parsing timeout and a limited parsing window, which tree-sitter in Neovim does not (yet). It's important to compare apples with oranges here. Unqualified claims like

And textmate is the best answer.

do not help; at the very least I would have expected a benchmark here comparing (fairly!) the timings between regex highlighting, nvim-treesitter, and your textmate plugin for these files.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 1, 2022, 12:24:29 PM9/1/22

to vim/vim, Subscribed

> Try editing this in neovim with treesitter on:
> https://code.jquery.com/jquery-3.6.1.js
> 10,000+ lines of code
>
> Try even scrolling through sqlite3.c in neovim with treesitter
> 200,000+ lines of code
>
> Plain vim has no problem with these files. Granted, it would be rare
> editing very large files. But when something vim could do previously
> well is no longer possible - it should be considered a regression.

It's not that rare. I worked on a compiler that produced C code, one
big file for the whole program. Others have mentioned generated XML.

If we are going to introduce a new way of syntax highlighting, it must
be able to handle this. Even when that is going to be difficult.

This means the parser must be able to start at some point in the file.
It may look back for a point to synchronize, but always starting at the
top of the file isn't going to be sufficient. It may very well mean
this is a different "mode" where some information is missing. While for
regular sized files everything is available.

Also keep in mind that it must be able to handle deleting and inserting
lines and still be fast. Also when that is a thousand lines. You don't
want to wait more than a second when moving code around.

Adding a new engine is going to be something that needs to be done
properly, it is a big investment.

--
It is too bad that the speed of light hasn't kept pace with the
changes in CPU speed and network bandwidth. -- ***@***.***>

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

errael

unread,

Sep 1, 2022, 12:58:21 PM9/1/22

to vim/vim, Subscribed

Adding a new engine is going to be something that needs to be done properly, it is a big investment.

I hope that if/when the time comes for working on this, it is thought of as

Adding an engine interface allowing different implementations to be used

As discussed in #9087 (comment)

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Linwei

unread,

Sep 1, 2022, 1:31:59 PM9/1/22

to vim/vim, Subscribed

Some information:

I was reading vscode's latest documentation and found that:

At last, it seems like that vscode didn't choose to integrate tree-sitter directly, but provided
some APIs to allow extensions to provide new highlighting solutions:

Currently, vscode has two highlighting solutions:

traditional textmate highlighting: https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide
semantic highlighting interface: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide

Semantic highlighting is an addition to syntax highlighting as described in the Syntax Highlight guide. Visual Studio Code uses TextMate grammars as the main tokenization engine. TextMate grammars work on a single file as input and break it up based on lexical rules expressed in regular expressions.

Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project. Themes can opt in to use semantic tokens to improve and refine the syntax highlighting from grammars. The editor applies the highlighting from semantic tokens on top of the highlighting from grammars.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

bfrg

unread,

Sep 1, 2022, 2:12:09 PM9/1/22

to vim/vim, Subscribed

The textmate parser everyone keeps referring to relies on the oniguruma regex library which contains approximately 80k lines of code. Is this even an option to integrate it into Vim? Users will have to learn a new regex flavor just for writing syntax files. On the other hand, if Vim uses its own regex engine, all the existing textmate syntax files won't work, or will they?

I would like to see a comparison between Vim's syntax highlighting and textmate for a more complicated filetype, like C++, bash or similar. The author keeps suggesting textmate but hasn't shown anything (at least a screenshot comparison). Where does textmate shine exactly? And what exactly is easier express in textmate's syntax files?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 1, 2022, 2:55:20 PM9/1/22

to vim/vim, Subscribed

> > Adding a new engine is going to be something that needs to be done
> > properly, it is a big investment.
>
> I hope that if/when the time comes for working on this, it is thought of as
> ```
> Adding an engine interface allowing different implementations to be used
> ```

> As discussed in https://github.com/vim/vim/issues/9087#issuecomment-1209586519

Providing an interface shifts the problem elsewhere, it doesn't solve
it. You still need a good endpoint for the interface, otherwise it's
useless. And unless that endpoint is one good solution, this will
result in multiple alternatives, making it more difficult for the user
who "just want it to work". And for someone who wants to support a
certain language (without spending too much time on it) creates extra
decisions to be made. It's a lot better to say "this is what you do"
instead of "you could do this, or that, or the other".

--
I have read and understood the above. X________________

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 1, 2022, 4:02:38 PM9/1/22

to vim/vim, Subscribed

> The textmate parser everyone keeps referring to relies on the

> [oniguruma](https://github.com/kkos/oniguruma/) regex library which

> contains approximately 80k lines of code. Is this even an option to
> integrate it into Vim? Users will have to learn a new regex flavor
> just for writing syntax files. On the other hand, if Vim uses its own
> regex engine, all the existing textmate syntax files won't work, or
> will they?

Thank you very much for making this remark. I would say this is a deal
breaker. Not only because of using some non-standard regex syntax, also
because this library is, well, obscure? Have a look at one of the main
files: regcomp.c
(https://github.com/kkos/oniguruma/blob/master/src/regcomp.c)
There is no comment anywhere. No explanation of what the purpose of a
function is, what the arguments mean, nothing.
The file that looks like the main engine is riddled with C macros:
https://github.com/kkos/oniguruma/blob/master/src/regexec.c
And also no comments anywhere.

This makes it very difficult to maintain. Or perhaps I should say
"impossible'?

Also, what kind of regex engine is this? Backtracing, NFA, something
else? There is a FAQ - it has two entries, one of them says "there is
no mailing list".

> I would like to see a comparison between Vim's syntax highlighting and
> textmate for a more complicated filetype, like C++, bash or similar.
> The author keeps suggesting textmate but hasn't shown anything (at
> least a screenshot comparison). Where does textmate shine exactly? And
> what exactly is easier express in textmate's syntax files?

I think it's a given that the current Vim syntax engine is not ideal.
It was initiated long ago, completely depends on runtime pattern
recognition, etc. It has been optimized over the years, and it's amazing
that it is still an acceptable working solution.

It does function as the base to compare with. Any new syntax engine
needs to work a lot better than the existing one, otherwise it's not
worth switching over.

--
BEDEVERE: And what do you burn, apart from witches?
FOURTH VILLAGER: ... Wood?
BEDEVERE: So why do witches burn?
SECOND VILLAGER: (pianissimo) ... Because they're made of wood...?
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 1, 2022, 6:09:41 PM9/1/22

to vim/vim, Subscribed

Yes, because Vim has a parsing timeout and a limited parsing window, which tree-sitter in Neovim does not (yet). It's important to compare apples with oranges here. Unqualified claims like

Just run an eye test like I said. Try opening the said files. You do need to make a benchmark.

Re: Textmate is the best answer (yes - is probably biased). Let me change that - Textmate is the best immediate solution. Virtually everyone else uses it because the top IDEs use it - sublime text, atom, vscode, intellij (i think). Hence, it already has a wide language coverage.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 1, 2022, 6:38:23 PM9/1/22

to vim/vim, Subscribed

This means the parser must be able to start at some point in the file.
It may look back for a point to synchronize, but always starting at the
top of the file isn't going to be sufficient.

Treesitter - from what I understand - and I used it a little only - always starts from the top of the buffer and requires access to the entire buffer. (correct me if I'm wrong).

You could make subsequent parse faster by telling it which parts of the buffer has changed before running the parse. Treesitter is from the Atom guys (in think). Atom runs the parser with its fast buffer snapshot feature and on a separate thread.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Mathias Fußenegger

unread,

Sep 2, 2022, 6:08:23 AM9/2/22

to vim/vim, Subscribed

Virtually everyone else uses it because the top IDEs use it - sublime text, atom, vscode, intellij (i think).

I think virtually everyone is a bit of an exaggeration

Atom is dead: https://github.blog/2022-06-08-sunsetting-atom/
Do you have a source for intellij? Looking at https://plugins.jetbrains.com/docs/intellij/implementing-lexer.html and given that there is a textmate plugin: https://www.jetbrains.com/help/idea/textmate.html#import-textmate-bundles I suspect by default it doesn't use textmate

That would leave vscode and sublime text, when considering the more popular editors.

Looking at tree-sitter:

neovim
zed
helix
There are also efforts to bring it to emacs: https://emacs-tree-sitter.github.io/

And although vscode doesn't use tree-sitter, it's still used at Github: https://github.blog/2021-12-09-introducing-stack-graphs/

This should at least give some confidence that tree-sitter is a) not dead, and b) the quality of the parsers will be improved, and vim joining the efforts could help

Let me change that - Textmate is the best immediate solution.

Best by what metric?

Regarding the performance:

It could be true that the initial parse with tree-sitter is slower (numbers?), but I think for a fair comparison one also needs to take re-parsing into consideration when making edits to a document. Given that people use editors to edit documents, that's kinda important.

And one of the goals of tree-sitter is that:

Fast enough to parse on every keystroke in a text editor

I think actual numbers would help this discussion a lot

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 2, 2022, 9:47:16 AM9/2/22

to vim/vim, Subscribed

I think actual numbers would help this discussion a lot

Yes, actual numbers would be much better. But then you'd have to code something first for vim.
So I went ahead and made a treesitter plugin.

https://github.com/icedman/vim-treesitter

Time is well spent at coding than debating. I'm a lawyer by the way ;)

The plugin is highly experimental. And this implementation currently cheats. It doesn't parse the entire buffer - but only whats visible - with some look aheads and look backs. This way it can jump and parse anwhere in the doc.

It can open sqlite3.c (200K lines ) and jquery (20K lines) without a problem. It has some artifacts where the portions of the parsed buffer results in error

This is also still very inefficient as it constantly re-parses the entire visible buffer. But the treesitter parse is indeed fast.

I will probably attempt parsing the entire document and updating the tree the way the library is supposed to be used.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Sep 2, 2022, 10:28:19 AM9/2/22

to vim/vim, Subscribed

When considering tree-sitter performance: the experience will be totally different depending on the parser (and the grammar rules that created it). Also incremental parsing performance depends drastically on the parser, some will invalidate the whole buffer on certain characters while this will never happen for other grammars. As @icedman commented, even whole buffer parsing on each keystroke is possible for most parser (Neovim has no incremental parsing for injected languages yet, but helix does).

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Shane-XB-Qian

unread,

Sep 2, 2022, 11:31:35 AM9/2/22

to vim/vim, Subscribed

This is also still very inefficient as it constantly re-parses the entire visible buffer. But the treesitter parse is indeed fast.

i tried to build and setup your vim-treesitter plugin, (though only build for ..._c),
and using (testing on) vim/src/main.c as example:

1, not sure if it was more (hi correction) accurate than vim native syntax (maybe 'yes'?)

2, but to perf, i saw it beat cpu very much when i kept pressing j from top to bottom.....

// my laptop maybe not happy on that............. :-)

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Shougo

unread,

Sep 2, 2022, 11:43:22 AM9/2/22

to vim/vim, Subscribed

treesitter parsing is more accurate.
But it is slow especially huge C files.
So I cannot use treesitter for C code.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 2, 2022, 12:17:49 PM9/2/22

to vim/vim, Subscribed

The scope to vim highlight mapping in the vim-textmate plugin and the node type to vim highlight mapping in the
vim-treesitter plugin are both very incomplete. Visually, you wouldn't appreciate the difference.

But you can run :TxmtDebugScopes and :TSDebugNodes to see what the parsers see.

I did the treesitter plugin not such to test its speed. But to see whether a special mode - "cheat mode" for large documents is possible with treesitter (as mentioned by Bram). This is where parsing is not done on the entire document but only partially. It turns out this could be possible. Treesitter is very "fault tolerant" as advertised.

The error portions of the tree, can be handled by the native vim syntax highlighter.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Shane-XB-Qian

unread,

Sep 2, 2022, 12:42:06 PM9/2/22

to vim/vim, Subscribed

But you can run :TxmtDebugScopes and :TSDebugNodes to see what the parsers see.

to textmate, maybe more interesting on its syntax accurate (hi correction) with its existed resource, i guess...
// vim native syntax fs sometimes was broken (specific ft), that would be a alternative way as 119 help........... :-)

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Sep 2, 2022, 1:49:29 PM9/2/22

to vim/vim, Subscribed

I did the treesitter plugin not so much to test its speed. But to see whether a special mode - "cheat mode" for large documents is possible with treesitter (as mentioned by Bram). This is where parsing is not done on the entire document but only partially. It turns out this could be possible. Treesitter is very "fault tolerant" as advertised.

It could be done though architecture of tree-sitter was build to exactly not needing to do that and avoid the artifacts that this windowing has when it comes to paired syntax tokens like { }, "/" and (/) for Lisp.

Tree-sitter allows to set a deadline in microseconds for parsing and querying there's a guarantee that the function call takes no longer than the limit you set. The idea was to do synchronous parsing with a deadline set and then fork the parsing to a background thread. Each parsing state is a immutable (copy-on-write state) so the asynchronous parsing can race against another synchronous request. The continuing of a deadlined parsing that will keep the achieved progress. So for a really big file you would need to wait a bit until the first asynchronous syntax highlight and then update using synchronous incremental parsing. Neovim does not use the deadline feature or asynchronous parsing yet. tree-sitter-c needs 240ms to parse itself (2.4MiB), tree-sitter-cpp needs 900ms to parse itself (9.8MiB). It should be possible to wait for that as long as it does not block your typing. Highlighting can still be done windowed (querying and setting highlights) but it can use the accurate AST of the whole file. You could even use both strategies and use the windowed parsing approach only until you wait for the whole file parsing to finish in the background.

If you edit the file you either have small changes that can be incrementally parsed or you paste 10MiB into the file that will trigger background parsing.

It is of course a question of preference on whether the windowed approach is already good enough for you.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 2, 2022, 8:05:24 PM9/2/22

to vim/vim, Subscribed

Neovim does not use the deadline feature or asynchronous parsing yet.

So this is why it is very slow on large files. Even querying the tree (merely moving the cursor around) looks like slow blocking calls.

You could even use both strategies and use the windowed parsing approach only until you wait for the whole file parsing to finish in the background.

That is the idea. Native syn can augment the highlights. And using the complete AST when it becomes available or syncd is a great idea.

I'm thinking of implementing the incremental updates. But it looks very cumbersome to do through a lua-and-C plugin.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

errael

unread,

Sep 4, 2022, 11:35:14 PM9/4/22

to vim/vim, Subscribed

Providing an interface shifts the problem elsewhere

(for discussion say scanner is regex and a parser is grammar;
either can be the basis of a language plugin)

I would think that part of developing the interface would be to
provide an implementation based on a winner: TreeSitter, TextMate or
some dark horse.

A benefit of an interface is that vim internally can focus on
integrating/using results and supporting optional async operation;
like managing the partial/window results as discussed by @icedman. A
complex language like c++ could have both scanner and parser
solutions. Scanner results are used until parser results are
available. In addition, a language plugin can provide info for
folding, indent, ...

I'm assuming the current syntax files, VimSyn, still work; VimSyn is
the first implementation and supports many languages. This thread is
all about a 2nd implementation, formalizing how an implementation
interfaces with vim seems worth the effort. The extra effort might
even be small compared to the overall task.

someone who wants to support a certain language (without spending
too much time on it) creates extra decisions to be made

Isn't it sufficient to say do it this way first. With a brief
comment to look elsewhere for other more complex techniques. If the
"certain language" is simple then the default method probably works
well, if it's complex then the extra decisions are probably pretty
minor compared to the overall task; and probably worth taking the time
to consider.

In many/most cases a simple scanner solution, VimSyn, is good enough;
some cases are greatly enhanced by a parser solution (inherently more
complex and time consuming to implement). Only one implementation
seems a problem; either inaccurate results or too much complexity.
Additionally, there might be existing parser/scanners that aren't done
in the single integrated language plugin implementation or some new
general parser/scanner might emerge. If someone is willing to make it
available to vim, it's difficult, if not impossible, to do if there is
no vim interface.

If a substantial effort is going to be made in this area, locking in a
single 3rd party solution seems the wrong choice.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 5, 2022, 8:26:11 AM9/5/22

to vim/vim, Subscribed

> > Providing an interface shifts the problem elsewhere
>
> (for discussion say `scanner` is regex and a `parser` is grammar;

> either can be the basis of a `language plugin`)

>
> I would think that part of developing the interface would be to

> provide an implementation based on a _winner_: TreeSitter, TextMate or

> some dark horse.
>
> A benefit of an interface is that vim internally can focus on
> integrating/using results and supporting optional async operation;
> like managing the partial/window results as discussed by @icedman. A
> complex language like c++ could have both scanner and parser
> solutions. Scanner results are used until parser results are
> available. In addition, a language plugin can provide info for
> folding, indent, ...
>
> I'm assuming the current syntax files, VimSyn, still work; VimSyn is
> the first implementation and supports many languages. This thread is
> all about a 2nd implementation, formalizing how an implementation
> interfaces with vim seems worth the effort. The extra effort might
> even be small compared to the overall task.

Theoretically, supporting all possible options will give the best
results. However, we also have to be practical. Consider the lack of
maintenance of existing syntax files. For various reasons only a few
get updated regularly. The effort is quickly too much and maintainers
disappear. We don't have unlimted resources.

Also, if we have an API, then Vim must support the maximum set of
features of all possible API endpoints. That is complicated. And if we
make it asynchronous it's even more difficult to get right.

There is also the LSP effort, which adds on top of this. But probably
is no replacement (I don't think there will be a server for every of the
300 languages that Vim currently highlights). Roughly, highlighting can
be done without understanding the code (e.g., whether a name is a
variable or a function) and LSP is used for understanding the code.
Let's not try to make the parser for highlighting also understand the
code, that makes it more complicated and most likely won't work well
(think included/imported files).

To be realistic, it's better to have one solution that works well for
more than 90% of the languages (and OK for the rest). And makes it easy
to add and maintain a language. Only then can we expect supporting 300
languages with a new implementation.

--
hundred-and-one symptoms of being an internet addict:
4. Your eyeglasses have a web site burned in on them.

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Shane-XB-Qian

unread,

Sep 5, 2022, 9:35:30 AM9/5/22

to vim/vim, Subscribed

> We don't have unlimted resources.

then seems textmate was the only option？
just make it to be a *standby* syntax which to use/borrow ^H^H^H^H^H^H^H existed resource and Microsoft (VSC) has unlimited resourses. :lol

--
shane.xb.qian

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Blay263

unread,

Sep 9, 2022, 10:29:39 PM9/9/22

to vim_dev

Do we need some kind of vote or is Bram leaning one way?

Bram Moolenaar

unread,

Sep 10, 2022, 7:17:45 AM9/10/22

to vim...@googlegroups.com, Blay263

> Do we need some kind of vote or is Bram leaning one way?

This is not something to make a quick choice over. Ideally we see all
alternatives implemented, see how well each one works and then decide.
But that is an awful lot of work and throwing away much of it.

We should start with setting some requirements. Some will be hard
requirements, some will be preferences or "good to have".

A hard requirement is that displaying and editing text should not be
noticeably slowed down by highlighting.

Having the highlighting show up asynchronously, or become more accurate
after a delay, is considered bad, only to be used if there is no other
way. For the bulk of the files it should be instantaneous. We have had
quite a few users complain about flickering, we don't want to introduce
more of that.

A very good thing to have (but not a hard requirement) is being able to
use language specifications that other editors use.

A hard requirement is that it must be possible to use a language
specification (fetch it from github) without installing tools (compiler,
builder program, etc.). Thus the specification must be usable on any
computer with just Vim installed. Providing it pre-compiled in some way
is fine (although taking care of versions might add new problems).

We would prefer to use the regexp engine we already have, not add another
one (not only for code size, also to avoid having to learn yet another
syntax). Possibly a regexp syntax that is close enough can be
translated into Vim's regexp syntax.

It should be fairly easy to create and maintain a language
specification. About the same as with the current syntax highlighting
(or easier, hopefully). It must be able to handle any language (some
better than others).

One important question is which kind of language specification works
best for specifying the structure and get good highlighting?
Independent of the engine used to execute it. After all, if it is
difficult to write or doesn't get the correct highlighting, then we
won't get people writing them.

If needed, we could accept more than one specification. E.g. one that
other editors use and one that works better (hopefully with the same
engine). We'll have to also support the old syntax highlighting anyway.

--
Micro$oft: where do you want to go today?
Linux: where do you want to go tomorrow?
FreeBSD: are you guys coming, or what?

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

icedman

unread,

Sep 10, 2022, 7:45:05 PM9/10/22

to vim/vim, Subscribed

Here is another investigation into Textmate. This time in Ruby:
https://github.com/icedman/vim-textpow

This is very very fast compared to the lua with C version I made earlier. This uses a very old textmate implementation (github.com/grosser/textpow). It still requires some update. But the code looks very maintainable at only 844 total lines, including the vim plugin with a bonus that Ruby looks nice too the eye. Oniguruma is baked into Ruby 2.0. It is not that obscure afterall.

The lua with C version:
https://github.com/icedman/vim-textmate

This is more complete and can render textmate themes. But this lags when scrolling too fast with pageup or pagedown - I needed to employ some defered highlighting cheats. This also needs build tools to compile the C module and not as self-contained as the textpow version.

I guess textmate can live as a ruby or lua plugin until a new syntax highlighter is developed or the current one improved

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 10, 2022, 7:55:35 PM9/10/22

to vim/vim, Subscribed

Someone also pointed out to me this project
https://github.com/trishume/syntect
Claims to be a fast textmate parser highlighter. It is in Rust.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 11, 2022, 8:32:10 AM9/11/22

to vim/vim, Subscribed

> Someone also pointed out to me this project
> https://github.com/trishume/syntect
> Claims to be a fast textmate parser highlighter. It is in Rust.

Looking around it appears to me that TextMate grammar is based on
regular expressions. It uses matches to find syntax items, which may
then contain other items, thus creating a grammar.

Isn't this what the Vim syntax already offers? I wonder what the big
advantage is. I don't see how it can be faster, most time will still be
spent on regexp matching.

A quote from the document refernece below:
With begin/end, if the end pattern is not found, the overall
match does not fail: rather, once the begin pattern is matched,
the overall match runs to the end pattern or to the end of the
document, whichever comes first. The underlying architectural
reason is that the TextMate parser does not backtrack; once the
begin pattern is matched, it is matched successfully and that’s
that — TextMate can’t change its mind and decide that it
shouldn’t have matched the begin pattern after all.

Isn't that exactly how Vim's syn-region works?

I do find some clear disadvantages. Such as that the patterns only work
within one line (Vim's regexp can cross line boundaries). And lack of
errors/warnings, which can make it very difficult to get patterns right.
I also get confused by the many file formats involved (at least json,
yaml and xml).

I found this document from someone writing a TextMate grammar. It's
eight years old, but I don't suppose much has changed since then.
https://www.apeth.com/nonblog/stories/textmatebundle.html

I thought we were looking for the ultimate solution, not for "it's not
great but it's the best we could find".

--
The 50-50-90 rule: Anytime you have a 50-50 chance of getting
something right, there's a 90% probability you'll get it wrong.

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Doug Kearns

unread,

Sep 11, 2022, 11:08:40 AM9/11/22

to vim/vim, Subscribed

Yes, the premise of this report has always seemed a bit dubious to me. I was going to investigate further and actually run some tests but haven't done so. Thanks @icedman.

The only real advantage I see to supporting TextMate would be the corpus of grammar files. It has a couple of nice features that are missing in Vim like the ability to specify submatches limited to the start and end regions but these could be added.

I occasionally investigate specific TextMate grammars to see how someone else might have handled a tricky case I'm trying to solve in a Vim syntax file but I can't recall seeing anything to suggest that the capabilities of that system is significantly better than what's currently available in Vim. Generally, I find that they don't have a solution or at best have implemented a similar one to myself.

Even Microsoft, with all their resources, can't generate a C# grammar that I don't regularly find bugs in.

I have also experienced some truly horrific highlighting performance with C++ and TypeScript, some of which was bad enough to bring down VS Code, and recall finding plenty of highlighting performance related bug reports when I investigated.

Most of the other grammar options seem more expressive than TextMate and other systems like Tree-sitter offer something extra like the AST generation.

I'm aware this commentary is next to useless but i have 2c as well...

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 11, 2022, 12:32:25 PM9/11/22

to vim/vim, Subscribed

> Yes, the premise of this report has always seemed a bit dubious to me.
> I was going to investigate further and actually run some tests but
> haven't done so. Thanks @icedman.
>
> The only real advantage I see to supporting TextMate would be the
> corpus of grammar files. It has a couple of nice features that are
> missing in Vim like the ability to specify submatches limited to the
> start and end regions but these could be added.

I wonder if we could convert a TextMate grammer into a Vim syntax file.
Would be worth a try. If the conversion written in Vim script we can
improve it over time. Unless we run into something that just won't
work.

--
hundred-and-one symptoms of being an internet addict:

39. You move into a new house and setup the Wifi router before
unpacking any kitchen stuff.

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

errael

unread,

Sep 11, 2022, 5:36:38 PM9/11/22

to vim/vim, Subscribed

I wonder if we could convert a TextMate grammer
into a Vim syntax file.

VimSyn is there, works and currently has ~300 languages. Just as an
interesting question is "can TextMate be translated to VimSyn?"; given
Vim's experience and what's out there, can a superior, even non
compatible, version of VimSyn be specified? and then current grammar
translated to it?

I continue to think that an interface is the way to go, rather than
picking this year's winner. I'd forgotten about LSP and thought it
didn't support syntax highlighting, but taking another look I saw a
bunch of stuff about work to add syntax highlighting API; I don't know
if that ever happened. This could be "the interface"?

I wonder if something like TextMate or TreeSitter could be front-ended
and/or made available through LSP.

Having the highlighting show up asynchronously, or become more
accurate after a delay, is considered bad, only to be used if there
is no other way. For the bulk of the files it should be
instantaneous. We have had quite a few users complain about
flickering

Does changing the color of a word cause flicker? If there's changes,
it means the first highlights were inaccurate. Would users choose to
have the display change to more accurate highlighting?; primarily
during startup of a new file. A scanner will never be as good as a
parser; I haven't used c++ for 20+ years, but I suspect some useful,
accurate highlight info (eg macros, templates, errors) would be
appreciated.

Of course, how performance is achieved is an implementation detail.
But having vim explicitly/API interoperate with capabilities like
"parse these areas of interest first" and "incremental parsing" and
"better results available for this area from the engine" might be
important. Does LSP handle these kinds of interaction?

Focusing on a fast internal syntax engine, optionally supplemented by
LSP when pinpoint accuracy is desirable. I guess managing two engines,
internal and LSP, at once is a big deal. So without VimSyn+LSP, if a
user chose a certain LSP, it might take a while for the initial syntax
highlights to show up. But with a good LSP front-ended implementation,
handling incremental parsing and windows of interest, after the
initial delay it should usually keep up.

I understand that LSP can go beyond syntax highlighting.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 12, 2022, 12:32:36 AM9/12/22

to vim/vim, Subscribed

I wonder if we could convert a TextMate grammer into a Vim syntax file.

Most other new editors (new relative to Textmate the app) adopted the textmate format because of the "corpus of grammar files" available (https://code.visualstudio.com/blogs/2017/02/08/syntax-highlighting-optimizations).

"corpus of grammar files" is not an insignificant advantage.

This leads me to agree that "converting textmate to vim syntax" this is worth investigating. And it doesn't have to be fully compatible.

Treesitter converts json and js files into C modules. Using their grammar config files is also worth looking into.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Sep 12, 2022, 5:53:47 AM9/12/22

to vim/vim, Subscribed

> > I wonder if we could convert a TextMate grammer
> > into a Vim syntax file.
>

> VimSyn is there, works and currently has ~300 languages. Just as an
> interesting question is "can TextMate be translated to VimSyn?"; given
> Vim's experience and what's out there, can a superior, even non
> compatible, version of VimSyn be specified? and then current grammar
> translated to it?

I think with VimSyn you mean the existing syntax engine. From my quick
browsing I would say the TextMate grammer can be converted to syntax
regions and matches. Perhaps we need to tweak the way items can be
contained in other items.

> > Having the highlighting show up asynchronously, or become more
> > accurate after a delay, is considered bad, only to be used if there
> > is no other way. For the bulk of the files it should be
> > instantaneous. We have had quite a few users complain about
> > flickering
>
> Does changing the color of a word cause flicker? If there's changes,
> it means the first highlights were inaccurate. Would users choose to
> have the display change to more accurate highlighting?; primarily
> during startup of a new file. A scanner will never be as good as a
> parser; I haven't used c++ for 20+ years, but I suspect some useful,
> accurate highlight info (eg macros, templates, errors) would be
> appreciated.

Yes, color changes will be very noticeable. Especially for things that
can be seen a lot. E.g. if a word could be a keyword or a variable
name, and this changes after two seconds, that's very strange. Perhaps
"flicker" isn't the right word for it.

If you would want a "quick first pass that mostly gets it right" and
then a more accurate one that's slower, this means implementing two
engines. Not an attractive solution.

Also keep in mind that users tend to jump around files a lot, not
spending more than a second before jumping to the next place (I do it
all the time). During that second the text is looked at to see if it's
the right place. The highlighting should be correct then. And this
currently works, we would not want to give up on it.

> Of course, how performance is achieved is an implementation detail.
> But having vim explicitly/API interoperate with capabilities like
> "parse these areas of interest first" and "incremental parsing" and
> "better results available for this area from the engine" might be
> important. Does LSP handle these kinds of interaction?

I doubt it. AFAIK LSP gives you a fully parsed version of the code.

> Focusing on a fast internal syntax engine, optionally supplemented by
> LSP when pinpoint accuracy is desirable. I guess managing two engines,
> internal and LSP, at once is a big deal. So without VimSyn+LSP, if a
> user chose a certain LSP, it might take a while for the initial syntax
> highlights to show up. But with a good LSP front-ended implementation,
> handling incremental parsing and windows of interest, after the

> initial delay it should _usually_ keep up.

>
> I understand that LSP can go beyond syntax highlighting.

It can be used to lookup information about a function, argument, etc.
The kind of things you would show in a popup window. For highlighting
it might be too slow, I'm not sure. In a project with code the server
for LSP would already have parsed the files before hand.

--
hundred-and-one symptoms of being an internet addict:

46. Your wife makes a new rule: "The computer cannot come to bed."

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Christian Clason

unread,

Sep 12, 2022, 6:01:54 AM9/12/22

to vim/vim, Subscribed

PSA: LSP is a red herring here, it's irrelevant to the topic at hand.

LSP is about project-level "intelligence", meaning gathering and using cross-file information. While one of its newer features is "semantic highlighting" (which allows you to highlight, e.g., variables in one file differently if they are declared as const in another), this is not its main purpose, and the LSP interface is a very poor fit for general syntax highlighting: Here in fact the OP's point applies: language servers are an external program, and the communication overhead means there'd be inacceptable latency.

(Think of it rather as providing an additional layer of more detailed highlights for some objects where additional information is available.)

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

mattn

unread,

Sep 12, 2022, 6:07:10 AM9/12/22

to vim/vim, Subscribed

I doubt it. AFAIK LSP gives you a fully parsed version of the code.

It depends on the Language Servers, but many Language Servers can accept half-formed source code and generate completion suggestions based on what is being typed.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 16, 2022, 7:42:24 AM9/16/22

to vim/vim, Subscribed

Some testing with tree-sitter on native c

https://github.com/icedman/vim/tree/treesitter
https://github.com/icedman/vim/blob/treesitter/src/TREESITTER.md

loading and parsing sqlite3.c (220K+ lines of code) takes around 1.1sec
editing a single line (incremental tree update) takes around 0.15sec (visible lag)
scrolling - instantaneous as no tree update is required

using windowing mode (partial parse of 2000 only lines)

loading 0.017sec
editing a single line 0.03sec
scrolling 0.05 (only when the cursor is nearing the edge of the 2000 line window)

Notes:
The numbers doesn't account for rendering of highlights but only for tree parsing and updates.
Treesitter reads through the entire buffer even at single line updates. Still fast - but I'm not sure though how efficient this is as I used ml_get_buf.
In contrast - a textmate parser can be updated by feeding only the line edited and the parser state of the previous line. And then updates are done on the succeeding lines if necessary.
Tree-sitter can do an initial parse of 220K+ lines at 1.1seconds -- that is fast. In contrast, I did a test on textmate sometime ag - parsing throught 220K+ lines on a single go required 15secs.

Windowing mode is fast. Whether 2000 lines of partial parse is acceptable is another question. For full treesitter parse -
I think a problem that needs to be solved is querying very large tree and translating it to highlights, indents or whatever else can be used of it. I skimmed through nvim-treesitter, it looks like it utilizes a lot of caching.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Sep 29, 2022, 8:09:22 AM9/29/22

to vim/vim, Subscribed

tree-sitter clone in javascript
https://lezer.codemirror.net/

possible in javascript = possible in vimscript?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Sep 29, 2022, 9:42:19 AM9/29/22

to vim/vim, Subscribed

I skimmed through nvim-treesitter, it looks like it utilizes a lot of caching.

None of the highlighting logic is in nvim-treesitter. Upstream nvim only queries the updated ranges as reported by the parser after a incremental parse

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yee Cheng Chin

unread,

Oct 2, 2022, 10:52:07 AM10/2/22

to vim/vim, Subscribed

Late to the discussion here, but when people talk about the wealth of syntax highlighting files in TextMate format (due to VSCode's popularity), can people be more specific? Vim also has 300+ file formats supported and is a popular text editor in its own right, so I'm curious if there are specific examples of a decently popular format that has awesome syntax highlighting but non-existent in Vim? And is that only because of lack of interest in maintaining a Vim-specific syntax file, or genuine technical roadblock?

Looking through the thread, I'm still not sure what the supporting argument for pursuing TextMate's syntax format is other than "it's what the other guys are using". That's a poor argument, especially when the people using it (e.g. VSCode) also have a giant thread discussing adopting tree-sitter. The supposed gains seems a little hand-wavy and quite minor as both Vim's and TextMate's systems are fundamentally regex engines and the TextMate system is quite old by now. Adding a new system like this is a huge endeaver so picking something that could at best be a little better seems like a bad idea to me considering its cost (both in implementation and maintenance).

For tree-sitter, it does look like a more ideal solution, but I'm still a little concerned about the lack of true semantic highlighting (e.g. C++ macros as mentioned above), and whether that's just adopting a not-quite-there solution. I'm also not exactly sure how people are supposed to distribute third-party tree-sitter plugins. Binary releases seem like a big regression from human-readable .vim files (the alternative to binary release is distributing tree-sitter source files that people need to compile themselves which is also not great, unless Vim bundles a compiler…).

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 3, 2022, 2:30:17 AM10/3/22

to vim/vim, Subscribed

I'm also not exactly sure how people are supposed to distribute third-party tree-sitter plugins.

I've been poking around treesitter. It could be possible to define a grammar from a vim file. Looking at the generated parser for C. It looks like it contains several parsing tables which could be imported from a file. It has lexer function which looks like it could be converted into a table as well.

From what I understand treesitter allows defining your own scanner, such as in the parser for CPP. A scanner coded by hand or not generated. It may also be possible though to allow callbacks from vimscript to implement the scanner.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 3, 2022, 2:49:26 AM10/3/22

to vim/vim, Subscribed

? And is that only because of lack of interest in maintaining a Vim-specific syntax file

Could be.

That is why it could be worth investigating if converting textmate grammars is possible.

It could also be worth investigating how far off are vim syntax from textmate highlights. Maybe a tweak or a minor feature add-on could improve it greatly.

I've been poking around textmate too. So far as to implementing it in C :) .. github.com/icedman/tiny-textmate ; it is s small enough in could live in a browser. github.com/icedman/wasm-tiny-textmate

Textmate It terms of speed - parsing a whole document, there is no way it can compete with treesitter (even if someone vastly improves my code). Probably no way also it could compete with vim syn to justify replacing it.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yee Cheng Chin

unread,

Oct 3, 2022, 5:36:39 PM10/3/22

to vim/vim, Subscribed

That is why it could be worth investigating if converting textmate grammars is possible.

Right. I feel like that should be the first thing to try before we start incorporating the entire TextMate system into Vim. But step 0 should really be finding concrete examples first (as I mentioned). Maybe something like TypeScript, which is a Microsoft-developed technology (since Microsoft also makes VSCode)? It's useful to at least find out what the "worst case" situation is to begin with.

Back to tree-sitter. I'm not sure if it's necessarily the best technology for this (I still have concerns about the project's design which relies on pre-compiled binaries, which makes it hard to distribute plugins), but I do something like it (which provides context-free grammar support) is beneficial and the obvious next steps for providing more accurate syntax highlighting and/or code understanding (e.g. you can ask the editor to give you the scope of a function with this).

I guess my annoyance with the discussion here is still that there seems to be a lot of hearsays, comparison by analogy, deferral to authority ("the other editors are using this"), and just focusing on implementation specifics, instead of a more fundamental / principled discussion of what kind of properties we want from a syntax highlighting engine. TextMate and tree-sitter are quite different technologies, and so whether we want to adopt each should be a different discussion rather than just a "just pick a syntax engine" (since Vim already has one). It's easier if we could establish 1) what problems exactly we are trying to solve here, and 2) what are the properties we need.

To me, some of the existing problems with Vim's syntax highlighting are:

Performance in large files. Vim can get quite slow in parsing large files, and there isn't a good way to interrupt it (see this StackOverflow thread.
Syncing issues. Sometimes Vim syntax highlighting can become out of sync in a large file (see syn-sync. This is tied to the performance issue since we can't always start parsing from beginning of file.
Correctness of highlighting. Vim uses regex matching which are fundamentally limiting. It doesn't always understanding what is a class and what is a variable, etc.
Lack of support. I'm only putting it here as it seems to be the sentiment on this thread, but I'm not sure if it's a real problem considering how popular Vim is. Most popular languages do have Vim plugins/syntax files available, though not all of them are available as part of the default Vim runtime bundles.

Some properties I think we would like (note that not all of them are always solvable, and some properties could work against each other):

Performance
1. Fast. Performance should be good and be able to parse the vast majority of files quickly.
2. Can handle large files without choking.
3. Ability to start parsing not from the beginning of the file every time.
4. Incremental updates (this is related to (ii) above). E.g. tree-sitter supports incremental update of the tree under some circumstances.
Async / background thread processing. May be a little controversial because this is more complicated and introduces latency, but I think that's better than a 3-second lag in not being able to type anything. Eventually we are going to have a large-enough file that even a performant parser can't completely parse in 100ms or so. The system we use should be designed to be thread-safe so we have the option and escape hatch to process asynchronously, while not killing latency.
Easy to write and distribute third-party plugins.
1. Ideally the system is easy to learn, and doesn't require a CS degree. A bonus is if it's already widely used or similar to existing systems.
2. This probably means a good way to distribute text-only plugins. Binaries are always tricky to deal with because the distributor probably don't have a way to easily compile Linux/Windows/macOS (x86 and ARM)/etc binaries, and there are security concerns as well (is the average Linux plugin author going to pay $100/year to Apple just to sign/notarize their plugin binary? I don't think so). Otherwise you are forcing your users to compile their own binaries which is annoying.
Correctness. Context-free grammer (used in tree-sitter) is definitely an upgrade from the regex solution that Vim / TextMate uses, particularly . I wonder if it makes (1.iii.) above impossible though (which some other comments already alluded to). Also, semantics highlighting is still best, but to properly do it requires running a compiler (through LSP or not) and probably needs to include other files as well (imagine a C/C++ file with complicated macros / include's) which seems like it will never be good enough as a first-layer solution as it will be slow, language-specific, and complicated to set up.
1. The correctness aspect also allows Vim to expose the syntax tree to plugins and the user.
2. The ability to handle errors is also important to pay attention to, in that ideally it's stable if you just made a typo.
3. Hmm, actually looking through tree-sitter more, it actually cheats on the context-free aspect to handle cases like Python since Python isn't context-free (indentation parsing is context-aware), so it cheats by using an external parser. I couldn't find how it handles C++ macros though.
Ease of integration. Just putting it here, because some projects that look good on paper can be near-impossible to integrate due to code quality / dependencies etc. (think gcc vs llvm)

Anyway I digress, just my 2c.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 3, 2022, 9:54:05 PM10/3/22

to vim/vim, Subscribed

"editor uses this x x x" .. this should not be understated.

It's hard enough to make a new syntax engine. It's harder or at least exponential more work to create new grammars and this relies on individuals creating a grammar for their favorite language.

There's a of work already done in textmate. We would be wise to at least see what can be reused to help improve the existing engine.

Treesitter is the future (my humble opinion). Nvim-treesitter will most probably eventually improve.
It's also a good idea to join in the effort there

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Oct 4, 2022, 5:13:32 PM10/4/22

to vim/vim, Subscribed

Many good remarks, thanks.

If we find a way to use existing language descriptions from other
engines, that obviously is a big win. Especially if they are working
well. I guess TextMate comes closest to what we already have, it is
worth a try to convert them. Or even use the TextMate description with
an adapted Vim syntax engine. Where can we find a representative one?

If I understand it correctly, both Vim syntax and TextMate define
regions with patterns, which can contain other regions and eventually
items that are highlighted. This involves scanning the text and each
point try matching with the list of patterns that might appear there
until one is found that matches.

This can be slow if there are many patterns to match, especially if the
pattern is not rejected quickly. E.g. "(some.*)" needs to scan a lot of
text before deciding the ")" cannot be found.

When using a grammar, which I believe treesitter does, all possible
symbols can be put in a state diagram, pushing states at every
character. Much like an NFA regexp this looks at each text character
once and updates the possible matches, until one sequence remains.

Compiling into an executable binary is out of the question. You
mentioned some reasons, I think distributing binaries is very tricky and
having the user compile won't work in general (many systems don't have a
compiler). It would be possible to compile into some kind of byte code,
but it must be a simple compiler, otherwise it gets too big. We already
have the regexp program "compiler" and the Vim9 :def function compiler,
something like that could work.

Looking at some treesitter output, the bulk is a table with states.
That doesn't require a compiler (like Vim spell files are generated into
a binary form). Also note that what treesitter produces is huge. I
don't know how representative it is, I found this parser for typescript:
https://raw.githubusercontent.com/tree-sitter/tree-sitter-typescript/master/typescript/src/parser.c
This is 6 Mbyte of C code, 200'000 lines.

Is this what one has to type to specify this?
https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/corpus/declarations.txt

In my opinion performance is not top priority. Sure, there are some
languages that are slow, but even the current Vim engine works fast
enough in most cases. We don't need to make C or Java highlighting
faster, it is already good. We only need to worry about corner cases,
such as a very large file with XML.

--
Ten bugs in the hand is better than one as yet undetected.

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Oct 10, 2022, 11:18:24 PM10/10/22

to vim/vim, Subscribed

Is this what one has to type to specify this?
https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/corpus/declarations.txt

No, this is a test file. Typescript source text and expected parsing result alternating (separated by ------). The grammar specification is always called grammar.js

The typescript parser is confusing since it contains two parser definitions which share from a common JS file https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/define-grammar.js (called from tsx/grammar.js and typescript/grammar.js). The grammar definition has a dialect argument which makes distinctions between tsx and typescript possible. For all other languages, the grammar definition is the grammar.js in the root directory (e.g. https://github.com/tree-sitter/tree-sitter-cpp/blob/master/grammar.js).

It would be possible to compile into some kind of byte code,
but it must be a simple compiler, otherwise it gets too big. We already
have the regexp program "compiler" and the Vim9 :def function compiler,
something like that could work.

There were also complaints about the Node JS requirement for tree-sitter-cli. tree-sitter-cli is a kind of compiler: it has a front end which currently invokes Node JS https://github.com/tree-sitter/tree-sitter/blob/3563fe009aa3cf373ae01782979743e6aa258a0a/cli/src/generate/mod.rs#L171-L192. The output of the frontend is src/grammar.json https://github.com/tree-sitter/tree-sitter/blob/3563fe009aa3cf373ae01782979743e6aa258a0a/cli/src/generate/dsl.js#L418 which contains a full description of the grammar. The backend generates C code which is dynamically loadable by the tree-sitter runtime https://github.com/tree-sitter/tree-sitter/tree/master/lib/src.

It is certainly possible to move the compilation to runtime or first load time. Everything the runtime expects is a TSLanguage https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/parser.h#L90-L127. See the last lines of src/parser.c for the TSLanguage returned by a generated file. If Vim would write such a compiler, it could be used as alternative loading mechanism also in Neovim and Helix.

When Neovim or Helix want to load a language, e.g. the language foo, they search for a symbol called tree_sitter_foo. They will call tree_sitter_foo() which returns the TSLanguage the runtime expects. If Vim or upstream tree-sitter can provide a runtime compiler for a new grammar description language, they could load TSLanguage also by calling grammar_runtime_compiler("path/to/grammar/definition.file") with no changes to the tree-sitter runtime required. The initial revision of "path/to/grammar/definition.file" which can be a human editable format could be generated by a modified tree-sitter-cli with a changed compiler backend or popular languages are manually ported.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Oct 11, 2022, 3:40:53 AM10/11/22

to vim/vim, Subscribed

> > Plain vim has no problem with these files.
>
> Yes, because Vim has a parsing timeout and a limited parsing window,
> which tree-sitter in Neovim does not (yet). It's important to compare
> apples with oranges here. Unqualified claims like
>
> > And textmate is the best answer.
>
> do not help; at the very least I would have expected a benchmark here
> comparing (fairly!) the timings between regex highlighting,
> nvim-treesitter, and your textmate plugin for these files.

Very true. Open claims about "my thing works better/faster than your
thing" are useless. These things can be measured and therefore must be
measured. If we have some solutions to choose between we should have
benchmarks to measure the performance. This can also be used to tune
and improve each solution.

--
BEDEVERE: Why do you think she is a witch?
SECOND VILLAGER: She turned me into a newt.
BEDEVERE: A newt?
SECOND VILLAGER: (After looking at himself for some time) I got better.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Oct 11, 2022, 8:54:13 AM10/11/22

to vim/vim, Subscribed

> > Is this what one has to type to specify this?
> > https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/corpus/declarations.txt
>
> No, this is a test file. Typescript source text and expected parsing

> result alternating (separated by `------`). The grammar specification
> is always called `grammar.js`

All this stuff appears to suffer from lack of comments.

> The typescript parser is confusing since it contains two parser
> definitions which share from a common JS file
> https://github.com/tree-sitter/tree-sitter-typescript/blob/master/common/define-grammar.js

> (called from `tsx/grammar.js` and `typescript/grammar.js`). The

> grammar definition has a dialect argument which makes distinctions
> between tsx and typescript possible. For all other languages, the
> grammar definition is the grammar.js in the root directory (e.g.
> https://github.com/tree-sitter/tree-sitter-cpp/blob/master/grammar.js).

A C++ parser is likely to be one of the most complex ones, not good as
an example. Also amazing that someone can write this without any
comments to explain what is what or how it refers to the standard.

> > It would be possible to compile into some kind of byte code,
> > but it must be a simple compiler, otherwise it gets too big. We already
> > have the regexp program "compiler" and the Vim9 :def function compiler,
> > something like that could work.
>
> There were also complaints about the Node JS requirement for
> tree-sitter-cli. tree-sitter-cli is a kind of compiler: it has a front
> end which currently invokes Node JS
> https://github.com/tree-sitter/tree-sitter/blob/3563fe009aa3cf373ae01782979743e6aa258a0a/cli/src/generate/mod.rs#L171-L192.
> The output of the frontend is `src/grammar.json`
> https://github.com/tree-sitter/tree-sitter/blob/3563fe009aa3cf373ae01782979743e6aa258a0a/cli/src/generate/dsl.js#L418
> which contains a full description of the grammar. The backend
> generates C code which is dynamically loadable by the tree-sitter
> runtime
> https://github.com/tree-sitter/tree-sitter/tree/master/lib/src.

We need to separate what would be needed for someone writing a langauge
definition and an average Vim user. When working on a language it's
fine to require tools needed for that, so long as they are widely
available. For a Vim user installing Node JS, C compiler, and things
like that is out of the question.

> It is certainly possible to move the compilation to runtime or first
> load time. Everything the runtime expects is a TSLanguage
> https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/parser.h#L90-L127.
> See the last lines of `src/parser.c` for the TSLanguage returned by a
> generated file. If Vim would write such a compiler, it could be used
> as alternative loading mechanism also in Neovim and Helix.

I was more thinking of distributing compiled byte code. It would still
need to be in the form of a language description, rather than code that
is executed (to avoid trojan horses). If the byte code is more or less
readable, or turned into something readable with a Vim command, that
would be a big plus.

The tree-sitter-cli is actually a nice starter for understanding how it
works: https://github.com/tree-sitter/tree-sitter-cli

--
ARTHUR: I am your king!
WOMAN: Well, I didn't vote for you.
ARTHUR: You don't vote for kings.
WOMAN: Well, 'ow did you become king then?
The Quest for the Holy Grail (Monty Python)

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Stephan Seitz

unread,

Oct 11, 2022, 9:31:25 AM10/11/22

to vim/vim, Subscribed

I was more thinking of distributing compiled byte code. It would still
need to be in the form of a language description, rather than code that
is executed (to avoid trojan horses). If the byte code is more or less
readable, or turned into something readable with a Vim command, that
would be a big plus.

Tree-sitter uses WASM to run on the web. There were ideas for Neovim to run a WASM runtime as a plugin host.

The tree-sitter-cli is actually a nice starter for understanding how it
works: https://github.com/tree-sitter/tree-sitter-cli

tree-sitter-cli lives now here https://github.com/tree-sitter/tree-sitter/tree/master/cli. Tree-sitter-cli is a ahead-of-time compiler and not needed by a end user when the compilation result is distributed (WASM, binaries, C code, or something new tailored for the needs of Vim). I was arguing that tree-sitter-cli is just one compiler implementation (Node JS frontend, Rust backend) and an alternative one could be written with different grammar DSL. Editors typically ship only tree-sitter runtime (C library without dependencies). Any function that can return a TSLanguage struct should work with the C runtime.

I was more thinking of distributing compiled byte code. It would still
need to be in the form of a language description, rather than code that
is executed (to avoid trojan horses)

src/grammar.json contains all the information needed. It can surely be represented in a more compact binary representation. Arbitrary code execution is only used for scanners. For C++, the scanner is only needed when some state needs to be stored which is only used for raw strings (https://www.geeksforgeeks.org/raw-string-literal-c/, for R"delimiter( raw_characters )delimiter" it needs to store that it parsed delimiter to determine the appropriate closing token)

A C++ parser is likely to be one of the most complex ones, not good as
an example.

Sorry, for that. C++ is also a bad example since most of the code lives in tree-sitter-c and rules are only extended in tree-sitter-cpp.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Isopod

unread,

Oct 11, 2022, 9:32:52 AM10/11/22

to vim/vim, Subscribed

A C++ parser is likely to be one of the most complex ones, not good as an example.

Try the JSON one: https://github.com/tree-sitter/tree-sitter-json/blob/master/grammar.js

But it’s worth noting that Tree-Sitter requires a custom lexer (implemented in C/C++) for many languages such as Python that cannot be parsed by a context-free grammar alone. So distributing grammars as byte-code might be challenging. Even if it were possible, that would make syntax highlighting even slower, and Tree-Sitter is already quite slow in my experience.

I actually did a performance comparison of several editors a few weeks ago, but unfortunately still haven’t gotten round to publishing the results.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Oct 11, 2022, 11:48:43 AM10/11/22

to vim/vim, Subscribed

> > The tree-sitter-cli is actually a nice starter for understanding how it
> > works: https://github.com/tree-sitter/tree-sitter-cli
>
> tree-sitter-cli lives now here
> https://github.com/tree-sitter/tree-sitter/tree/master/cli.

I was referring to the README.md that is displayed there. It doesn't
show in the new location.

> Tree-sitter-cli is a ahead-of-time compiler and not needed by a end
> user when the compilation result is distributed (WASM, binaries, C
> code, or something new tailored for the needs of Vim). I was arguing
> that tree-sitter-cli is just one compiler implementation (Node JS
> frontend, Rust backend) and an alternative one could be written with
> different grammar DSL. Editors typically ship only tree-sitter runtime
> (C library without dependencies). Any function that can return a
> TSLanguage struct should work with the C runtime.

AFAIK that TSLanguage struct is the result of compiling the C code that
tree sitter produced. Thus it requires compiling on the Vim user side.
That is a deal breaker, we cannot expect a Vim user to compile C code.

If the compilation can be done before, and then the result can be
obtained by the user, that might work. But it can't be in a machine
binary. Perhaps WASM works, not sure. It does require a kind of "build
once, run anywhere" mechanism.

--
DENNIS: Listen -- strange women lying in ponds distributing swords is no
basis for a system of government. Supreme executive power derives
from a mandate from the masses, not from some farcical aquatic
ceremony.

The Quest for the Holy Grail (Monty Python)

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 11, 2022, 5:30:17 PM10/11/22

to vim/vim, Subscribed

Re treesitter, user end compilation is too much of hurdle that it eliminates itself as a possible replacement for syntax highlighting.

I would suggest opening a new issue for it to explore it further.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 11, 2022, 5:45:23 PM10/11/22

to vim/vim, Subscribed

Vim syntax can be improved with some of the features from textmate:

Captures or sub-matches (I think the regex engine already has this and it just needs to be exposed to the syntax engine?)
Captures further evaluated into sub match
Dynamic end matches (based on begin captures)
Begin/while...in addition to begin/end

.. I think #1 is not too much work and would by itself greatly enhance the existing engine

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Doug Kearns

unread,

Oct 12, 2022, 5:21:27 AM10/12/22

to vim/vim, Subscribed

Is dynamic end matching different from :help :syn-ext-match?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Oct 12, 2022, 6:20:13 AM10/12/22

to vim/vim, Subscribed

> Vim syntax can be improved with some of the features from textmate:

> 1. Captures or sub-matches (I think the regex engine already has this and it just needs to be exposed to the syntax engine?)
> 2. Captures further evaluated into sub match
> 3. Dynamic end matches (based on begin captures)
> 4. Begin/while...in addition to begin/end

>
> .. I think #1 is not too much work and would by itself greatly enhance the existing engine

Can you provide more details, so that we can have an idea of how
complicated this would be? Ideally with an example (in TextMate) of the
rules and some example text of what it will match.

A next stop would be if we can make a converter from a TextMate grammer
to Vim syntax rules. So we can see if it could work. Perhaps some
adjustments in the syntax command is needed.

--
With sufficient thrust, pigs fly just fine.
-- RFC 1925

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

icedman

unread,

Oct 16, 2022, 8:10:35 PM10/16/22

to vim/vim, Subscribed

A next stop would be if we can make a converter from a TextMate grammer
to Vim syntax rules. So we can see if it could work.

I did try generating a vim syntax file from a textmate grammar file. So far, I could only make simple keyword matches work. The regex engine would complain: "too many parenthesis", and something like "you cannot use this pattern recursively"

Can you provide more details, so that we can have an idea of how
complicated this would be?

Will do.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.