On Di, 29 Dez 2020, '
hebar...@googlemail.com' via vim_dev wrote:
> [[:upper:]]*\{2,}* is not correctly applied, resulting in not finding what
I suppose the problem is, that the second and fourth word in the input
isn't matched?
> 05. ПЕСНЯ О ГЕРОЯХ муз. А. Давиденко, М. Коваля и Б. Шехтера ...
^^^^^ ^^^^^^
That is an interesting case. There are 2 peculiarities here:
By default, Vim comes with two different regexp engines, which you can
switch using the 'regexpengine' option. (See :h 'regexpengine' and
:h two-engines)
By default, it uses the automatic mode, which is usually the NFA engine,
only for some costly patterns, it might fall-back to the old
backtracking engine.
For some reason, the NFA engine, when used in automatic mode, fails to
compile this regex (however it doesn't mention that it switches the
engines :/). I see this in the logfile:
,----
| >>> NFA engine failed...
| Regexp: "\<[[:upper:]]\{2,}\>"
| Postfix notation (char): "NFA_BOW , NFA_START_COLL, NFA_CLASS_UPPER, NFA_CONCAT , NFA_END_COLL, "
| Postfix notation (int): -1006 -1021 -831 -1014 -1020
`----
Vim then switches back to backtracking engine (I am not sure why,
because it doesn't call `report_re_switch()`). The way this engine uses
POSIX character classes is basically it adds all possible upper
characters between 1-255 that are upper case characters into a big or
branch. I believe a character range can contain at most 256 characters
and I suppose because of old 8bit encodings it stops at 256. That's why
those other upper characters are not found.
However, if you manually switch to the nfa regexp engine, it starts to
work again. I am a bit puzzled, why this time compiling the engine
works.
I think an alternative (and faster) way would be to use the \u atom
instead of `[[:upper:]]`.
Best,
Christian
--
Was die neuen Unwissenden holen müssen:
Schlüssel zum Verfügungsraum