[vim/vim] Greedy pattern doesn't match far enough? (Issue #17197)

12 views
Skip to first unread message

D. Ben Knoble

unread,
Apr 23, 2025, 10:49:58 AM4/23/25
to vim/vim, Subscribed
benknoble created an issue (vim/vim#17197)

Steps to reproduce

  1. echo matchstr('"\""', '^"\%([^"]\|""\|\\"\)\+"') yields "\", not "\""
  2. But echo matchstr('"\""', '^"\%(\\"\|""\|[^"]\)\+"'), which should be equivalent thanks to \+ being greedy and backtracking (?), yields "\"".

Expected behaviour

Both patterns should match the whole string. (See tpope/vim-fugitive#2395 and https://vi.stackexchange.com/a/46757/10604 for downstream impact.)

The shell agrees:

printf '"\\""\n' | grep -E '^"\([^"]|""|\\"\)+"'
# => "\""

Version of Vim

9.1.1048

Environment

OS: macOS 12.7.6
shell: Zsh 5.9

Logs and stack traces


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197@github.com>

Christian Brabandt

unread,
Apr 23, 2025, 2:36:08 PM4/23/25
to vim/vim, Subscribed
chrisbra left a comment (vim/vim#17197)

Weird. Doesn't work in neither regex engine. But this pattern works: ^"\%([^"]\|""\|\\"\)\+"$


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2825175525@github.com>

dkearns

unread,
Apr 23, 2025, 2:55:59 PM4/23/25
to vim/vim, Subscribed
dkearns left a comment (vim/vim#17197)

printf '"\\""\n' | grep -E --only-matching '^"([^"]|""|\\")+"'
=> "\""
printf '"\\""\n' | grep -P --only-matching '^"([^"]|""|\\")+"'
=> "\""

The alternation semantics are different for POSIX regex. It takes the longest alternation, \\" and Vim, like PCRE, takes the left-most [^"]

But this pattern works: ^"%([^"]|""|\")+"$

That forces it to try the longer alternation branch.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2825225560@github.com>

dkearns

unread,
Apr 23, 2025, 3:30:06 PM4/23/25
to vim/vim, Subscribed
dkearns left a comment (vim/vim#17197)

Weird. Doesn't work in neither regex engine. But this pattern works: ^"\%([^"]\|""\|\\"\)\+"$


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2825320231@github.com>

dkearns

unread,
Apr 23, 2025, 3:30:07 PM4/23/25
to vim/vim, Subscribed

Closed #17197 as completed.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issue/17197/issue_event/17377252458@github.com>

dkearns

unread,
Apr 23, 2025, 3:31:05 PM4/23/25
to vim/vim, Subscribed

Reopened #17197.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issue/17197/issue_event/17377263211@github.com>

D. Ben Knoble

unread,
Apr 23, 2025, 5:01:14 PM4/23/25
to vim/vim, Subscribed
benknoble left a comment (vim/vim#17197)

The alternation semantics are different for POSIX regex. It takes the longest alternation, \\" and Vim, like PCRE, takes the left-most [^"]

But this pattern works: ^"\%([^"]\|""\|\\"\)\+"$

That forces it to try the longer alternation branch.

Well that certainly explains it, and was my suspicion. Is this documented? (Away from computer.)

Unfortunately fugitive cannot anchor with $ and probably needs to find another way to take longest match 🤔


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2825492388@github.com>

Aliaksei Budavei

unread,
Apr 23, 2025, 6:06:42 PM4/23/25
to vim/vim, Subscribed
zzzyxwvut left a comment (vim/vim#17197)

A bit of pattern massaging may help:

" Original patterns:
let o1 = '^"\%([^"]\|""\|\\"\)\+"'
let o2 = '^"\%(\\"\|""\|[^"]\)\+"'

" Permuted, modified patterns (with [^\\"] and atomic grouping):
let m1 = '^"\%(\%([^\\"]\|""\|\\"\)\+\)\@>"'
let m2 = '^"\%(\%([^\\"]\|\\"\|""\)\+\)\@>"'
let m3 = '^"\%(\%(""\|[^\\"]\|\\"\)\+\)\@>"'
let m4 = '^"\%(\%(""\|\\"\|[^\\"]\)\+\)\@>"'
let m5 = '^"\%(\%(\\"\|[^\\"]\|""\)\+\)\@>"'
let m6 = '^"\%(\%(\\"\|""\|[^\\"]\)\+\)\@>"'

" Sample strings:
let s1 = '"\""'
let s2 = '"\"\""'
let s3 = '"\"\"\""'
let s4 = '""""'
let s5 = '""""""'
let s6 = '"foo \"foo\" bar \"bar\""'
let s7 = '"\"foo\" foo \"bar\" bar"'
let s8 = '"\"foo\" \"bar\""'
let s9 = '"foo bar"'
let s10 = '"!"'

for e in ['\%#=1', '\%#=2']
  for p in [m1, m2, m3, m4, m5, m6, o1, o2]
    for s in [s1, s2, s3, s4, s5, s6, s7, s8, s9, s10]
      let m = matchstr(s, e .. p)
      if s != m
	echo printf("\"%s\" != matchstr(%s, %s%s)", m, s, e, p)
      endif
    endfor
  endfor
endfor

echo '.'


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2825596679@github.com>

Christian Brabandt

unread,
Apr 24, 2025, 4:42:29 AM4/24/25
to vim/vim, Subscribed
chrisbra left a comment (vim/vim#17197)

I couldn't find this in the POSIX specs for ERE but it is defined in PCRE description.

How about this to make it more clear?

diff --git a/runtime/doc/pattern.txt b/runtime/doc/pattern.txt
index 857a3e648..20caccb5f 100644
--- a/runtime/doc/pattern.txt
+++ b/runtime/doc/pattern.txt
@@ -339,6 +339,10 @@ For starters, read chapter 27 of the user manual |usr_27.txt|.
    that matches one of the branches.  Example: "foo\|beep" matches "foo" and
    matches "beep".  If more than one branch matches, the first one is used.

+   Note: Vim behaves according to perl-compatible regular expressions (PCRE)
+   here, POSIX Extended Regular Expressions (ERE) prefer to use the longest
+   branch instead of the first matching branch.
+
    pattern ::=     branch
                or  branch \| branch
                or  branch \| branch \| branch


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2826821223@github.com>

dkearns

unread,
Apr 24, 2025, 8:41:41 AM4/24/25
to vim/vim, Subscribed
dkearns left a comment (vim/vim#17197)

How about this to make it more clear?

POSIX regex might use the shorter branch if it returns the longest overall match. E.g., it uses the shorter "fo" branch here:

printf 'foobarbaz' | grep -P --only-matching '^(foo|fo)(obarbaz|bar)' # => foobar
printf 'foobarbaz' | grep -E --only-matching '^(foo|fo)(obarbaz|bar)' # => foobarbaz

This sort of difference applies in other contexts as well.

printf 'foobarbaz' | grep -P --only-matching '^foo(bar)?(barbaz)?' # => foobar
printf 'foobarbaz' | grep -E --only-matching '^foo(bar)?(barbaz)?' # => foobarbaz

The behaviour is documented in the "matched" section at https://pubs.opengroup.org/onlinepubs/9799919799/

I think the help is OK as it is and documenting the differences to POSIX might only muddy the waters. It's probably clearer just to document what Vim's engine does, in isolation, and it does that.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2827478404@github.com>

Christian Brabandt

unread,
Apr 24, 2025, 9:58:32 AM4/24/25
to vim/vim, Subscribed
chrisbra left a comment (vim/vim#17197)

okay, then unless anybody has some objects, it's time to close this as not a bug


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2827727715@github.com>

D. Ben Knoble

unread,
Apr 24, 2025, 12:35:32 PM4/24/25
to vim/vim, Subscribed
benknoble left a comment (vim/vim#17197)

How about this to make it more clear?

POSIX regex might use the shorter branch if it returns the longest overall match.

Vim also (generally) uses the longest overall match, right? Or at least, I would have thought so, except for this case 🤔 Well, I guess maybe I'm wrong about that.

I think the help is OK as it is and documenting the differences to POSIX might only muddy the waters. It's probably clearer just to document what Vim's engine does, in isolation, and it does that.

Right, fair enough.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issues/17197/2828227531@github.com>

Christian Brabandt

unread,
Apr 24, 2025, 1:24:29 PM4/24/25
to vim/vim, Subscribed

Closed #17197 as not planned.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/issue/17197/issue_event/17394519933@github.com>

Reply all
Reply to author
Forward
0 new messages