Dear vim hackers,
I hope this is the right mailing list to ask.
Looking into vim's source code I see your regular expression engine is based
on Henry Spencer's library. Which one of them is it? [1] It doesn't look like
the version from Tcl, which would be a hybrid NFA/DFA. Backtracking is
memory-limited I hope or can you provoke Vim crashes with some pathological
regexp?
But even more I am interested on how you implemented backwards searches (?).
Apparently, you can search across EOLs by including `\n`, so you didn't go
the route of matching on lines as the subject string and going to the
previous line in case of failure. Some vi implementations and editors do
that, thus disallowing patterns that span multiple lines. On the other hand,
I see that backwards searches will not produce leftmost longest matches -
e.g. `[a-z]*` will match the empty string in contrast to a forward search,
which will produce the longest match. But then `.*` matches an entire line,
which confuses me a bit. `\(.\|\n\)*` matches an entire line, but not beyond
that while a forward search for the same pattern matches until the document
end. (And yes, memory limiting does appear to be in place.) I see that
backwards searches are fast even on huge files. This looks like you analyze
the regexp to find candidate starts and then apply forward matching from
these candidate starts!? As a fallback you try to match on every character
going backwards I suppose? Did you try to write down the resulting - somewhat
inconsistent - regexp semantics somewhere?
I am not a vim user - I wrote my own editor [2] and I am just interested in
how you solved the typical backwards search implementation problems, so
please excuse any stupid questions. At least I couldn't find obvious answers
on
vimhelp.org or in the FAQ.
Feel free to point me to particular places in the code.
Best regards,
Robin
[1]:
https://garyhouston.github.io/regex/
[2]:
https://sciteco.fmsbw.de/