spell checking contents of only some tags in XML

91 views
Skip to first unread message

Vadim Zeitlin

unread,
Jun 5, 2009, 6:36:05 PM6/5/09
to vim...@googlegroups.com
Hello,

I'm trying to enable spell checking for the contents of some tags in a
(custom dialect of) XML, e.g. I have something like this:

<?xml version="1.0"?>
<foo>
<bar>notatext</bar>
<text>Some text I'd like to spell check</text>
</foo>

and would like spelling mistakes be highlighted inside <text> but not
inside <bar> (and not anywhere else, so e.g. "syn spell toplevel" is not a
solution).


My first attempt to do it was

syn region xmlString start="<text>\zs" end="\ze</text>" \
contains=xmlEntity,@Spell

but for some reason this doesn't work and I just can't understand why. It
seems to be somehow related to the definition of xmlTagName in
$VIMRUNTIME/syntax/xml.vim because the above does work in non-XML files.
I also tried

syn match xmlString "<text>\zs\_[^<]\+\ze</text>"\
contains=xmlEntity,@Spell

but this failed to work too. Does anybody have any idea why?


My second attempt was

syn region xmlString start="\(<text>\)\@<=" end="\(</text>\)\@=" \
contains=xmlEntity,@Spell

which does work, i.e. it correctly highlights the tags contents as strings
and does detect spelling errors. Unfortunately it also makes Vim unusably
slow, presumably because it tries to match "" following \@<= everywhere.

So my final version is

syn region xmlString start="\(<text>\)\@<=[A-Z]" \
end="\(</text>\)\@=" contains=xmlEntity,@Spell

which relies on the fact that text in my XML files usually starts with a
upper-case letter and, as there are not that many upper-case letters in XML
file, this works reasonably fast -- although still noticeably slower than
without this definition.

Can anyone suggest a better way of doing this? It looks like "\zs" is
exactly what I need here but I just can't make it work unfortunately
(tested in Vim 7.0 and 7.2 under Windows and 7.1 under Linux, the behaviour
is the same everywhere).

Thanks in advance for any hints!
VZ

Ben Fritz

unread,
Jun 8, 2009, 11:17:27 AM6/8/09
to vim_use


On Jun 5, 5:36 pm, Vadim Zeitlin <vz-...@zeitlins.org> wrote:
>  Hello,
>
>  I'm trying to enable spell checking for the contents of some tags in a
> (custom dialect of) XML, e.g. I have something like this:
>
>         <?xml version="1.0"?>
>         <foo>
>                 <bar>notatext</bar>
>                 <text>Some text I'd like to spell check</text>
>         </foo>
>
> and would like spelling mistakes be highlighted inside <text> but not
> inside <bar> (and not anywhere else, so e.g. "syn spell toplevel" is not a
> solution).
>
>  [snip]
>
>  Can anyone suggest a better way of doing this? It looks like "\zs" is
> exactly what I need here but I just can't make it work unfortunately
> (tested in Vim 7.0 and 7.2 under Windows and 7.1 under Linux, the behaviour
> is the same everywhere).
>

I think it may be having trouble creating a match that has zero width
for the start and end regions (not sure about this however). You're
right, it looks like adjusting the start of the match is the way to do
what you want. But, I suspect this will work better with the syntax
pattern offsets rather than the regex patterns for doing the same.
See :help :syn-pattern-offset

Vadim Zeitlin

unread,
Jun 9, 2009, 8:29:03 AM6/9/09
to vim_use
On Mon, 8 Jun 2009 08:17:27 -0700 (PDT) Ben Fritz <"fritzophre...@gmail.com"@MISSING_DOMAIN> wrote:

BF> On Jun 5, 5:36 pm, Vadim Zeitlin <vz-...@zeitlins.org> wrote:
BF> >  Hello,
BF> >
BF> >  I'm trying to enable spell checking for the contents of some tags in a
BF> > (custom dialect of) XML, e.g. I have something like this:
BF> >
BF> >         <?xml version="1.0"?>
BF> >         <foo>
BF> >                 <bar>notatext</bar>
BF> >                 <text>Some text I'd like to spell check</text>
BF> >         </foo>
BF> >
BF> > and would like spelling mistakes be highlighted inside <text> but not
BF> > inside <bar> (and not anywhere else, so e.g. "syn spell toplevel" is not a
BF> > solution).
...
BF> But, I suspect this will work better with the syntax pattern offsets
BF> rather than the regex patterns for doing the same.

Hello and thanks for your answer!

I knew I forgot something in my original message: I did try the offsets
approach too but without frank success. First of all, I thought that I
needed to use region start/end (rs/re), i.e. do this:

syn region xmlString start="<text>"rs=s+6 end="</text>"re=e-7 \
contains=xmlEntity,@Spell

But it didn't work at all and the entire line containing <text> was
highlighted as string. So I read the documentation more carefully and
discovered the part between the parentheses:

This can be used to change the highlighted part, and to change the
text area included in the match or region (which only matters when
trying to match other items)

So I tried the same thing with hs/he. This does highlight the string itself
properly but it removes highlight from <text> tag and matching closing tag
entirely which looks a bit ugly.

Remembering the "matters when trying to match other items" part I tried
with both hs and rs but this didn't change anything compared to just using
hs, i.e. <text> is still not matched as xmlTagName any more. Just to be
complete, I've also tried using svn match but this behaves in the same way:
I can make highlighting the tag contents work but at the expense of
correctly highlighting the tag itself. And combining ms and hs still
doesn't do anything differently from just using hs AFAICS.

So I'm still stuck with

(a) Perfect but extremely slow solution using just zero width matches
(b) Good in practice but rather ugly and still very slow solution using
zero-width match followed by a capital letter
(c) Simple but not good looking solution using pattern offsets

And the most annoying thing remains that "\zs" seems to exist to do exactly
what I want -- but doesn't work for some reason. And it's not due to using
zero width for both start and end as it still doesn't work if you do
something like

syn region xmlString start="<text>\zsSome" end="check\ze</text>"

neither (with the example above). But as soon as you remove \zs it starts
working (\ze does work as expected).

Would anybody have any idea about why is it so?

Thanks again,
VZ

Reply all
Reply to author
Forward
0 new messages