[vim/vim] Support for matching multiple words (#7163)

61 views
Skip to first unread message

Yegappan Lakshmanan

unread,
Oct 18, 2020, 12:30:01 AM10/18/20
to vim/vim, Subscribed

When a pattern has multiple words, fuzzy match each word (in any order)
and return the matching strings sorted by the match score.


You can view, comment on, or merge this pull request online at:

  https://github.com/vim/vim/pull/7163

Commit Summary

  • Support for matching multiple words

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

codecov[bot]

unread,
Oct 18, 2020, 12:57:14 AM10/18/20
to vim/vim, Subscribed

Codecov Report

Merging #7163 into master will increase coverage by 0.00%.
The diff coverage is 94.02%.

Impacted file tree graph

@@           Coverage Diff           @@

##           master    #7163   +/-   ##

=======================================

  Coverage   88.73%   88.73%           

=======================================

  Files         148      148           

  Lines      162113   162164   +51     

=======================================

+ Hits       143843   143900   +57     

+ Misses      18270    18264    -6     
Impacted Files Coverage Δ
src/search.c 92.64% <94.02%> (+0.03%) ⬆️
src/clipboard.c 83.31% <0.00%> (-0.11%) ⬇️
src/sign.c 95.03% <0.00%> (+0.08%) ⬆️
src/gui.c 63.35% <0.00%> (+0.14%) ⬆️
src/if_xcmdsrv.c 88.90% <0.00%> (+0.17%) ⬆️
src/gui_gtk_x11.c 59.06% <0.00%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 335e671...a6126b7. Read the comment docs.

Bram Moolenaar

unread,
Oct 18, 2020, 9:14:12 AM10/18/20
to vim/vim, Subscribed

Hmm, so if I match "one two" on a text that contains "two one", it stil matches? Isn't that a bit unexpected? If I would match "one_two" then it would not match "two_one", right? Thus we are making white space very special. Not sure if that is the goal.

Yegappan Lakshmanan

unread,
Oct 18, 2020, 1:01:13 PM10/18/20
to vim_dev, reply+ACY5DGDJLAMFQDQXM6...@reply.github.com, vim/vim, Subscribed
Hi Bram,

On Sun, Oct 18, 2020 at 6:14 AM Bram Moolenaar <vim-dev...@256bit.org> wrote:
>
> Hmm, so if I match "one two" on a text that contains "two one", it
> stil matches? Isn't that a bit unexpected?
>

Yes. If you search for "one two", it will match both "one two" and
"two one". Of course, the "one two" match will score higher than
"two one". This is useful, because the user then doesn't need to
know the order in which the words occur in the text.

The popular fuzzy matching tools like fzf does support this type
of matching. Also this feature was asked by several users.

>
> If I would match "one_two" then it would not match "two_one", right?
> Thus we are making white space very special. Not sure if that is the
> goal.
>

Yes. The white space separated text is treated as separate words.
Each word is fuzzy matched separately.

Regards,
Yegappan

vim-dev ML

unread,
Oct 18, 2020, 1:01:34 PM10/18/20
to vim/vim, vim-dev ML, Your activity

bfrg

unread,
Oct 18, 2020, 4:17:00 PM10/18/20
to vim/vim, vim-dev ML, Comment

Wouldn't it be better to make this optional? How can we fuzzy search for a literal white space character (not uncommon in file paths on Windows)?


You are receiving this because you commented.

Maxim Kim

unread,
Oct 18, 2020, 4:22:58 PM10/18/20
to vim_dev

Hmm, so if I match "one two" on a text that contains "two one", it stil matches? Isn't that a bit unexpected? If I would match "one_two" then it would not match "two_one", right? Thus we are making white space very special. Not sure if that is the goal.

This actually what I would expect from fuzzy matching these days. All "modern" fuzzy matchers do this (fzf, leaderf, clap, etc)

Yegappan Lakshmanan

unread,
Oct 18, 2020, 5:28:28 PM10/18/20
to vim_dev, reply+ACY5DGD66LBJ7N6H3D...@reply.github.com, vim/vim, vim-dev ML, Comment
Hi,

On Sun, Oct 18, 2020 at 1:16 PM bfrg <vim-dev...@256bit.org> wrote:

Wouldn't it be better to make this optional? How can we fuzzy search for a literal white space character (not uncommon in file paths on Windows)?


A character after a space is given an additional bonus score. So when you search
for a file path with space using a pattern that contains space, the file path with the
space will match first because of the higher score compared to a file path without
space in it.

- Yegappan
 

vim-dev ML

unread,
Oct 18, 2020, 5:28:46 PM10/18/20
to vim/vim, vim-dev ML, Your activity

Bram Moolenaar

unread,
Oct 19, 2020, 6:14:50 AM10/19/20
to vim/vim, vim-dev ML, Comment

We already have the optional {dict} argument. How about adding an item in there to disallow reordering words?
Naming it isn't so easy, "keep order" has a completely different meaning. Perhaps "sequential"?


You are receiving this because you commented.

Gary Johnson

unread,
Oct 19, 2020, 2:13:27 PM10/19/20
to reply+ACY5DGH666X2GGVGUM...@reply.github.com, vim...@googlegroups.com
On 2020-10-19, Bram Moolenaar wrote:
> We already have the optional {dict} argument. How about adding an item in there
> to disallow reordering words?
> Naming it isn't so easy, "keep order" has a completely different meaning.
> Perhaps "sequential"?

"Retain order"? "Keep sequence"? Just "sequential" doesn't convey
the meaning very well, at least not to me.

Regards,
Gary

vim-dev ML

unread,
Oct 19, 2020, 2:13:53 PM10/19/20
to vim/vim, vim-dev ML, Your activity

Yegappan Lakshmanan

unread,
Oct 20, 2020, 1:32:29 AM10/20/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

  • 9431e23 Add support for optionally matching only words that occur in a sequence. Add gap penalty to the score


You are receiving this because you are subscribed to this thread.

View it on GitHub or unsubscribe.

lacygoill

unread,
Oct 20, 2020, 4:03:03 AM10/20/20
to vim/vim, vim-dev ML, Comment

This is a welcome addition. Emulating this feature in Vim script was not trivial, and time-consuming. I've made a brief test, and it seems to work. However, I've noticed that the matchseq key is ignored when filtering a list of dictionaries instead of a list of strings.

Here is a test which currently fails, while I think it should pass:

vim9script
[#{text: 'one two'}, #{text: 'two one'}]
  ->matchfuzzy('two one', #{key: 'text', matchseq: true})
  ->assert_equal([{'text': 'two one'}])


You are receiving this because you commented.

lacygoill

unread,
Oct 20, 2020, 4:13:57 AM10/20/20
to vim/vim, vim-dev ML, Comment

Emulating this feature in Vim script was not trivial, and time-consuming.

Just to give an idea. Before the patch:

vim9script

def Permutations(l: list<string>): list<list<string>>
    if len(l) == 0
        return [[]]
    endif
    var ret = []
    for sublistPermutation in Permutations(l[1:])
        for permutation in InsertItemAtAllPositions(l[0], sublistPermutation)
            ret += [permutation]
        endfor
    endfor
    return ret
enddef

def InsertItemAtAllPositions(item: string, l: list<string>): list<list<string>>
    var ret = []
    for i in range(len(l) + 1)
        ret += [ (i == 0 ? [] : l[0 : i - 1]) + [item] + l[i : ] ]
    endfor
    return ret
enddef

var matchfuzzypos: list<any>
var filtered_source: list<dict<string>>
var pos: list<list<number>>

var tokens = split('one two three')
g:source = ['...']
if len(tokens) >= 4
    var rest = tokens[2:]->join()->substitute('\s\+', '', 'g')
    tokens = [tokens[0], tokens[1], rest]
endif

matchfuzzypos = tokens
    ->Permutations()
    ->map({_, v -> join(v, '')})
    ->map({_, v -> matchfuzzypos(g:source, v, #{key: 'text'})})
    ->reduce({a, v -> [a[0] + v[0], a[1] + v[1]]})

After the patch:

vim9script
var source = ['...']
var filtered_source: list<dict<string>>
var pos: list<list<number>>
[filtered_source, pos] = matchfuzzypos(source, 'one two three', #{key: 'text'})


You are receiving this because you commented.

lacygoill

unread,
Oct 20, 2020, 4:32:43 AM10/20/20
to vim/vim, vim-dev ML, Comment

I've noticed something else, although I'm not sure whether it could or should be changed.

When matchfuzzy*() filters a sequence of words, it is allowed to have some overlapping between the positions of 2 different words.

For example:

echo matchfuzzy(['ftplugin-docs'], 'fun undo')
['ftplugin-docs']

fun has been matched on these positions:

v   v  v
ftplugin-docs

undo has been matched on these positions:

ftplugin-docs
    ^  ^ ^^

If we place the two sets of positions on top of each other, we can see that the start of the second set starts before the end of the first one:

v   v  v
ftplugin-docs
    ^  ^ ^^

Note that fzf(1) does the same thing, so maybe that's what people expect (I don't; I would expect overlapping to be disallowed). I suspect that changing this would make the performance drastically decrease; if so, forget about this post.


You are receiving this because you commented.

lacygoill

unread,
Oct 20, 2020, 4:36:10 AM10/20/20
to vim/vim, vim-dev ML, Comment

I don't; I would expect overlapping to be disallowed

Although, if the overlapping matches are scored lower than the non-overlapping ones, that's not an issue.


You are receiving this because you commented.

Yegappan Lakshmanan

unread,
Oct 20, 2020, 7:49:22 AM10/20/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

  • 35bfad3 Matching sequence of words doesn't work with a dictionary


You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,
Oct 20, 2020, 7:50:11 AM10/20/20
to vim_dev, reply+ACY5DGHWPQ6ZUPJGCQ...@reply.github.com, vim/vim, vim-dev ML, Comment
Hi,

On Tue, Oct 20, 2020 at 1:03 AM lacygoill <vim-dev...@256bit.org> wrote:

This is a welcome addition. Emulating this feature in Vim script was not trivial, and time-consuming. I've made a brief test, and it seems to work. However, I've noticed that the matchseq key is ignored when filtering a list of dictionaries instead of a list of strings.

Here is a test which currently fails, while I think it should pass:

vim9script
[#{text: 'one two'}, #{text: 'two one'}]
  ->matchfuzzy('two one', #{key: 'text', matchseq: true})
  ->assert_equal([{'text': 'two one'}])



Thanks for reporting this issue. I have updated the PR with a fix for this.

Regards,
Yegappan 

vim-dev ML

unread,
Oct 20, 2020, 7:50:32 AM10/20/20
to vim/vim, vim-dev ML, Your activity

Yegappan Lakshmanan

unread,
Oct 20, 2020, 7:53:39 AM10/20/20
to vim_dev, reply+ACY5DGF3MCCVNIPPJG...@reply.github.com, vim/vim, vim-dev ML, Comment
Hi,

On Tue, Oct 20, 2020 at 1:32 AM lacygoill <vim-dev...@256bit.org> wrote:

I've noticed something else, although I'm not sure whether it could or should be changed.

When matchfuzzy*() filters a sequence of words, it is allowed to have some overlapping between the positions of 2 different words.

For example:

echo matchfuzzy(['ftplugin-docs'], 'fun undo')
['ftplugin-docs']

fun has been matched on these positions:

v   v  v
ftplugin-docs

undo has been matched on these positions:

ftplugin-docs
    ^  ^ ^^

If we place the two sets of positions on top of each other, we can see that the start of the second set starts before the end of the first one:

v   v  v
ftplugin-docs
    ^  ^ ^^

Note that fzf(1) does the same thing, so maybe that's what people expect (I don't; I would expect overlapping to be disallowed). I suspect that changing this would make the performance drastically decrease; if so, forget about this post.



Yes. Each word in the search pattern is separately fuzzy matched from the start
of the text. So you may have overlapping matches. As you have already
observed, fzf also uses a similar search algorithm.

Regards,
Yegappan

vim-dev ML

unread,
Oct 20, 2020, 7:54:00 AM10/20/20
to vim/vim, vim-dev ML, Your activity

ProgMetalSlug

unread,
Oct 20, 2020, 8:20:05 AM10/20/20
to vim/vim, vim-dev ML, Comment

What is considered a "word"? Does this take into account 'iskeyword'?


You are receiving this because you commented.

Yegappan Lakshmanan

unread,
Oct 20, 2020, 8:32:02 AM10/20/20
to vim_dev, reply+ACY5DGBNFNXG5CUFAM...@reply.github.com, vim/vim, vim-dev ML, Comment
Hi,

On Tue, Oct 20, 2020 at 5:20 AM ProgMetalSlug <vim-dev...@256bit.org> wrote:

What is considered a "word"? Does this take into account 'iskeyword'?



Currently the matchffuzy() function considers a series of characters
separated by white space (space character or a tab) to be a "word".
It doesn't consider characters not in 'iskeyword' to be a word separator.

- Yegappan

vim-dev ML

unread,
Oct 20, 2020, 8:32:22 AM10/20/20
to vim/vim, vim-dev ML, Your activity

Yegappan Lakshmanan

unread,
Oct 20, 2020, 9:38:29 AM10/20/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

  • 4894b8f Ignore white space in search pattern


You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,
Oct 20, 2020, 4:40:50 PM10/20/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

  • 42216bd Support for matching multiple words. Add support for optionally matching only words that occur in a sequence. Add gap penalty to the score

Yegappan Lakshmanan

unread,
Oct 20, 2020, 4:45:46 PM10/20/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

Yegappan Lakshmanan

unread,
Oct 22, 2020, 10:05:06 PM10/22/20
to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

  • 4b93f09 Don't reorder items with the same score

Bram Moolenaar

unread,
Oct 23, 2020, 6:39:38 AM10/23/20
to vim/vim, vim-dev ML, Comment

Are there any open remarks? Otherwise, let me know when this is ready to include.


You are receiving this because you commented.

Yegappan Lakshmanan

unread,
Oct 23, 2020, 10:18:57 AM10/23/20
to vim_dev, reply+ACY5DGEQZ6VRRL5VQD...@reply.github.com, vim/vim, vim-dev ML, Comment
Hi Bram,

On Fri, Oct 23, 2020 at 3:39 AM Bram Moolenaar <vim-dev...@256bit.org> wrote:

Are there any open remarks? Otherwise, let me know when this is ready to include.



No. There are no open remarks about this change. This is ready
to be included.

Regards,
Yegappan
 

vim-dev ML

unread,
Oct 23, 2020, 10:19:20 AM10/23/20
to vim/vim, vim-dev ML, Your activity


You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,
Oct 23, 2020, 10:51:03 AM10/23/20
to vim/vim, vim-dev ML, Comment

Closed #7163 via 8ded5b6.


You are receiving this because you commented.

Maxim Kim

unread,
Oct 23, 2020, 11:07:30 AM10/23/20
to vim_dev
Thx! Looks really good!


Reply all
Reply to author
Forward
0 new messages