echo matchstr('"\""', '^"\%([^"]\|""\|\\"\)\+"') yields "\", not "\""echo matchstr('"\""', '^"\%(\\"\|""\|[^"]\)\+"'), which should be equivalent thanks to \+ being greedy and backtracking (?), yields "\"".Both patterns should match the whole string. (See tpope/vim-fugitive#2395 and https://vi.stackexchange.com/a/46757/10604 for downstream impact.)
The shell agrees:
printf '"\\""\n' | grep -E '^"\([^"]|""|\\"\)+"'
# => "\""
9.1.1048
OS: macOS 12.7.6
shell: Zsh 5.9
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Weird. Doesn't work in neither regex engine. But this pattern works: ^"\%([^"]\|""\|\\"\)\+"$
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
printf '"\\""\n' | grep -E --only-matching '^"([^"]|""|\\")+"'
=> "\""
printf '"\\""\n' | grep -P --only-matching '^"([^"]|""|\\")+"'
=> "\""
The alternation semantics are different for POSIX regex. It takes the longest alternation, \\" and Vim, like PCRE, takes the left-most [^"]
But this pattern works: ^"%([^"]|""|\")+"$
That forces it to try the longer alternation branch.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Weird. Doesn't work in neither regex engine. But this pattern works:
^"\%([^"]\|""\|\\"\)\+"$
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Closed #17197 as completed.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
Reopened #17197.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
The alternation semantics are different for POSIX regex. It takes the longest alternation,
\\"and Vim, like PCRE, takes the left-most[^"]
But this pattern works:
^"\%([^"]\|""\|\\"\)\+"$
That forces it to try the longer alternation branch.
Well that certainly explains it, and was my suspicion. Is this documented? (Away from computer.)
Unfortunately fugitive cannot anchor with $ and probably needs to find another way to take longest match 🤔
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
A bit of pattern massaging may help:
" Original patterns: let o1 = '^"\%([^"]\|""\|\\"\)\+"' let o2 = '^"\%(\\"\|""\|[^"]\)\+"' " Permuted, modified patterns (with [^\\"] and atomic grouping): let m1 = '^"\%(\%([^\\"]\|""\|\\"\)\+\)\@>"' let m2 = '^"\%(\%([^\\"]\|\\"\|""\)\+\)\@>"' let m3 = '^"\%(\%(""\|[^\\"]\|\\"\)\+\)\@>"' let m4 = '^"\%(\%(""\|\\"\|[^\\"]\)\+\)\@>"' let m5 = '^"\%(\%(\\"\|[^\\"]\|""\)\+\)\@>"' let m6 = '^"\%(\%(\\"\|""\|[^\\"]\)\+\)\@>"' " Sample strings: let s1 = '"\""' let s2 = '"\"\""' let s3 = '"\"\"\""' let s4 = '""""' let s5 = '""""""' let s6 = '"foo \"foo\" bar \"bar\""' let s7 = '"\"foo\" foo \"bar\" bar"' let s8 = '"\"foo\" \"bar\""' let s9 = '"foo bar"' let s10 = '"!"' for e in ['\%#=1', '\%#=2'] for p in [m1, m2, m3, m4, m5, m6, o1, o2] for s in [s1, s2, s3, s4, s5, s6, s7, s8, s9, s10] let m = matchstr(s, e .. p) if s != m echo printf("\"%s\" != matchstr(%s, %s%s)", m, s, e, p) endif endfor endfor endfor echo '.'
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
I couldn't find this in the POSIX specs for ERE but it is defined in PCRE description.
How about this to make it more clear?
diff --git a/runtime/doc/pattern.txt b/runtime/doc/pattern.txt index 857a3e648..20caccb5f 100644 --- a/runtime/doc/pattern.txt +++ b/runtime/doc/pattern.txt @@ -339,6 +339,10 @@ For starters, read chapter 27 of the user manual |usr_27.txt|. that matches one of the branches. Example: "foo\|beep" matches "foo" and matches "beep". If more than one branch matches, the first one is used. + Note: Vim behaves according to perl-compatible regular expressions (PCRE) + here, POSIX Extended Regular Expressions (ERE) prefer to use the longest + branch instead of the first matching branch. + pattern ::= branch or branch \| branch or branch \| branch \| branch
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
How about this to make it more clear?
POSIX regex might use the shorter branch if it returns the longest overall match. E.g., it uses the shorter "fo" branch here:
printf 'foobarbaz' | grep -P --only-matching '^(foo|fo)(obarbaz|bar)' # => foobar
printf 'foobarbaz' | grep -E --only-matching '^(foo|fo)(obarbaz|bar)' # => foobarbaz
This sort of difference applies in other contexts as well.
printf 'foobarbaz' | grep -P --only-matching '^foo(bar)?(barbaz)?' # => foobar
printf 'foobarbaz' | grep -E --only-matching '^foo(bar)?(barbaz)?' # => foobarbaz
The behaviour is documented in the "matched" section at https://pubs.opengroup.org/onlinepubs/9799919799/
I think the help is OK as it is and documenting the differences to POSIX might only muddy the waters. It's probably clearer just to document what Vim's engine does, in isolation, and it does that.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
okay, then unless anybody has some objects, it's time to close this as not a bug
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
How about this to make it more clear?
POSIX regex might use the shorter branch if it returns the longest overall match.
Vim also (generally) uses the longest overall match, right? Or at least, I would have thought so, except for this case 🤔 Well, I guess maybe I'm wrong about that.
I think the help is OK as it is and documenting the differences to POSIX might only muddy the waters. It's probably clearer just to document what Vim's engine does, in isolation, and it does that.
Right, fair enough.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
Closed #17197 as not planned.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()