[vim/vim] Regular expressions with backreferences to a group including .* and \n are matching where they shouldn’t (#6239)

163 views
Skip to first unread message

Chris Morgan

unread,
Jun 11, 2020, 12:16:19 PM6/11/20
to vim/vim, Subscribed

Describe the bug
Backreferences are supposed to match the text that the nominated group matched. But once .* and \n are involved in the group, the regular expression engine can match different text in the backreference, allowing the backreference to kind of expand the .* a second time.

To Reproduce
Detailed steps to reproduce the behavior:

  1. Run vim --clean (or gvim --clean, etc.)
  2. Insert the following:
    foo
    
    bar
    
    barnaby
    
    baz
    
    
  3. Search for any duplicated lines: /\(^.*\n\)\1<Enter>
  4. Observe that this matches bar\nbarnaby\n. (It’s like it searched for \(^.*\)\n\1.*\n instead.)

Expected behavior
There should be no matches: in the case that did match, \1 is bar\n, which is different from “barn\n”.

(\(^.*$\)\n\1\n does not exhibit this bug.)

Environment (please complete the following information):

  • Vim 8.2.814
  • OS: Arch Linux


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

Christian Brabandt

unread,
Jun 11, 2020, 12:53:27 PM6/11/20
to vim/vim, Subscribed

can confirm and the 'regexpengine' setting does not seem to make a difference

John Little

unread,
Jun 11, 2020, 9:33:41 PM6/11/20
to vim/vim, Subscribed

As well, (^.*\n)\1 does not match duplicated lines at the end of the file.

Matthijs van Duin

unread,
Jul 2, 2021, 5:54:10 PM7/2/21
to vim/vim, Subscribed

It gets weirder, the pattern ^(.*\n)\1xyz matches neither

foo
foo
xyz

nor

foo
foobar
xyz

yet the pattern ^(.*\n)\1fyz matches both of these.

Matthijs van Duin

unread,
Jul 2, 2021, 6:07:27 PM7/2/21
to vim/vim, Subscribed

It looks like the bug has nothing to do with .*, just with backreferences to groups ending in a newline: @chris-morgan's example still works if you search for ^\(bar\n\)\1 instead of ^\(.*\n\)\1. Similarly, replacing .* by foo in my previous comment does not change the results.

Reply all
Reply to author
Forward
0 new messages