syntax files: how to prevent embedded syntax from leaking out of a region?

59 views
Skip to first unread message

Edward McGuire

unread,
Jul 21, 2023, 4:18:39 PM7/21/23
to vim_use
I developed a syntax file for Noweb -- if you're unfamiliar, it's a Literate Programming tool. Noweb source is LaTeX source, but with embedded code blocks. The embedded code blocks are delimited, and I define a syntax region that recognizes the delimiters. Inside a code block, I get highlighting that corresponds to the source code language. Outside, I get LaTeX highlighting.

But I have a problem with Perl highlighting that escapes past the end of a Perl code block. That is, source after the end of the code block is getting formatted as Perl, not as LaTeX.

Not all Perl breaks out -- I can't put my finger on it yet, but specific Perl constructs break out. For example, when I delete an elsif() clause, the problem disappears.

I'm closely following the examples in the Vim documentation. I'm already supplying the "keepend" argument to the syntax region, which is supposed to limit the embedded syntax to the region.

My working theory is that where the Perl syntax highlighting rules use "extend", this is letting the Perl syntax leak out of the region. Here's a quote from the stock perl.vim:
    syn region perlBraces start="{" end="}" transparent extend
The "extend" argument could be the reason the Perl syntax leaks out of the region.

Given this background, the question is, in my syntax file how can I reliably ALWAYS limit embedded language highlighting to the region that contains it, without the chance that the embedded language can break out?

Cheers!
Edward

Edward McGuire

unread,
Sep 14, 2023, 2:22:07 PM9/14/23
to vim_use
On Friday, July 21, 2023 at 3:18:39 PM UTC-5 Edward McGuire wrote:
> how can I reliably ALWAYS limit embedded language highlighting to the region that contains it, without the chance that the embedded language can break out?

Still looking for a solution to this -- how to prevent pre-installed language syntaxes from breaking out of syntax regions marked "keepend".

Cheers!
Edward

Edward McGuire

unread,
Oct 4, 2023, 1:29:47 PM10/4/23
to vim_use
This question got no replies. Did it fail to be circulated? Does nobody know the answer? Is it a stupid question? :)

Christian Brabandt

unread,
Oct 4, 2023, 2:55:55 PM10/4/23
to vim...@googlegroups.com

On Mi, 04 Okt 2023, Edward McGuire wrote:

> This question got no replies. Did it fail to be circulated? Does nobody know the answer? Is it a stupid question? :)

Either nobody knows the answer or nobody understood the problem.

Thanks,
Christian
--
There is never time to do it right, but always time to do it over.

Edward McGuire

unread,
Oct 4, 2023, 5:09:24 PM10/4/23
to vim_use
On Wednesday, October 4, 2023 at 1:55:55 PM UTC-5 Christian Brabandt wrote:
> Either nobody knows the answer or nobody understood the problem.

Thank you Christian, I'm glad to know it got out there anyway.

So maybe it will help for me to restate the problem, maybe not.

This has to do with syntax highlighting when a file written in one language has embedded in it a code block written in a different language.
The Vim distribution includes two great practical examples of how to do this.
Both examples highlight the code inside the block using different syntax rules than the code outside the block.

One is in help document "syntax.txt" under the tag "sh-embed".
This shows how to extend the sh syntax so that embedded awk code is highlighted using awk syntax.

The other is syntax file "ant.vim", which has a powerful technique to support multiple languages embedded in Ant syntax and highlight each language using the correct syntax file.

The problem I'm having is that sometimes code past the end of the block gets highlighted using the syntax rules for the code inside the block.

Here is an excerpt from the syntax I developed.
The excerpt highlights most of the buffer using the preinstalled "tex.vim".
Then, within any embedded code block introduced by a <<blockname>>= code block introducer, and terminated by an @ or another <<blockname>>= code block introducer, the embedded code block is highlighted by the preinstalled "make.vim".

syntax include @SyntaxTeX syntax/tex.vim
unlet b:current_syntax
syntax region Normal start=/\v%^/ end=/\v%$/ contains=@SyntaxTeX
syntax match nowebCodeChunkDeclarationMake "\v^[<][<][mM]akefile(|\s.*)[>][>][=]\s*$"
\ skipnl
\ containedin=@SyntaxTeX
\ nextgroup=nowebCodeChunkBodyMake
highlight link nowebCodeChunkDeclarationMake PreProc
syntax include @SyntaxMake syntax/make.vim
unlet b:current_syntax
syntax region nowebCodeChunkBodyMake
\ keepend
\ start="\v.*"
\ end="\v^([@]($| ))|([<][<].*[>][>][=][ \t]*$)"me=s-1
\ contained
\ contains=@SyntaxMake

The key bit is the "keepend" keyword on the embedded code block.
It is there to force the block terminator to also terminate any contained item.
It should stop the embedded language from reading past the block terminator.
But the embedded language syntax can override this using "extend".
That can cause the embedded language syntax highlighting to continue past the end of the code block delimiter.
And the problem seems to be that some of the preinstalled syntaxes do use "extend".

The question is how to prevent the preinstalled syntaxes from ever highlighting past the end of the delimiter.

M

unread,
Oct 4, 2023, 9:02:42 PM10/4/23
to vim...@googlegroups.com
"extend" clearly has the highest priority, so it cannot be overridden. But that's not the problem by itself. The real trouble is a misuse of it. It looks like every person out there starts playing with syntaxes by adding "keepend" and "extend" almost everywhere. While "extend" should only be used for those very special elements that essentially "violate normal rules", like escape sequences or such.

In other words, every inner syntax region is already "extend" by default, so an outer region _may_ put "keepend" to it if needed. But the inner region _under exceptional circumstances_ still may overpower it all with "extend". Maybe, we could add yet another override specially for syntax-include but, imo, that could lead to even more confusion. Instead, we should  better clean up existing syntax files from that garbage.

So, in my opinion, the whole issue is a consequence of having many poorly written syntax files. Main distribution included.

Kind regards,
Matvey

чт, 5 окт. 2023 г., 00:09 Edward McGuire <met...@gmail.com>:
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/7864bd47-6f44-42c5-8449-53e7740eae17n%40googlegroups.com.

Edward McGuire

unread,
Oct 6, 2023, 3:22:57 PM10/6/23
to vim_use
On Wednesday, October 4, 2023 at 8:02:42 PM UTC-5 M wrote:
> a consequence of having many poorly written syntax files. Main distribution included

Your idea of a different override for syntax-include is the practical solution. As you pointed out, syntax-include is dependent on the authors of all the syntax files in the distribution to beware of what happens when their syntax is syntax-included. It's the moral equivalent of cooperative multitasking or running a processor in real mode -- you have to trust every other syntax not to step on yours. A region needs some way to absolutely stop an included syntax from highlighting past the region.

My project is a free syntax file for Noweb source files.* A single Noweb source file can contain all the objects in multiple languages that make up a package. Those objects can be represented in the Noweb source file as fragments, to be pieced together by Noweb. For example, an inner syntax region can easily contain the open-brace for a block but not the close-brace.

An included syntax file will seize on that open-brace and highlight forward until it finds a matching close-brace. But it actually needs to stop at the end of the region that contains the code fragment. Because the close-brace it finds is likely to be in a different inner syntax region, maybe even one written in a different language!

As a plan "B", I hate to have to reinvent stock syntaxes, and I would probably do a poor job. This is my first major foray into Vim syntax highlighting. I might eventually take a stab at it, for languages I use where I have run into the problem. Plan "C" is to just document the problem, and give advice if I can for writing code fragments that sidestep the problem. Or plan "D", I could disable recognition of certain languages, which is what I do now, but I had been thinking of that as a stopgap measure during development, not a solution.

Reply all
Reply to author
Forward
0 new messages