[vim/vim] function to substitute text in buffer without side effects (#5632)

73 views
Skip to first unread message

lacygoill

unread,
Feb 12, 2020, 7:13:47 PM2/12/20
to vim/vim, Subscribed

Is your feature request related something that is currently hard to do? Please describe.

It is difficult to substitute all matches of a pattern with a replacement text inside the current buffer without side effects.


For example, consider this text in a buffer:

pat pat PAT
pat pat PAT
pat pat PAT

The cursor is on line 2 column 5, and this substitution is executed:

:%s/pat/rep/g

The cursor is now on line 3 column 1:

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'norm! 2Gw' +'set vbs=1|echo getpos(".")|qa!'

[0, 2, 5, 0]
    ^  ^

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'norm! 2Gw' +'%s/pat/rep/g' +'set vbs=1|echo getpos(".")|qa!'

[0, 3, 1, 0]
    ^  ^

In a script, to preserve the cursor position and the view, winsaveview() and winrestview() must be invoked:

let view = winsaveview()
%s/pat/rep/g
call winrestview(view)

A substitution also adds an entry in the jumplist:

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'set vbs=1|jumps|qa!'

 jump line  col file/text
   1     4    0

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'%s/pat/rep/g' +'set vbs=1|jumps|qa!'

 jump line  col file/text
   2     4    0
   1     3    0 rep rep PAT

In a script, to preserve the jumplist, :keepjumps must be invoked:

keepjumps %s/pat/rep/g
^^^^^^^^^

A substitution also alters the change marks:

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'norm! 2Gwyiw' +'set vbs=1|marks|qa!' 2>&1 | grep '\[\|\]'

 [      2    4 pat pat PAT
 ]      2    6 pat pat PAT

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'norm! 2Gwyiw' +'%s/pat/rep/g' +'set vbs=1|marks|qa!' 2>&1 | grep '\[\|\]'

 [      1    0 rep rep PAT
 ]      4    0

In a script, to preserve the change marks, :lockmarks must be invoked (requires 8.1.2302):

lockmarks %s/pat/rep/g
^^^^^^^^^

A substitution also alters the search register:

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'/foo' +'set vbs=1|echo @/|qa!'

foo

$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'/foo' +'%s/pat/rep/g' +'set vbs=1|echo @/|qa!'

pat

In a script, to preserve the search register, :keeppatterns must be invoked:

keeppatterns %s/pat/rep/g
^^^^^^^^^^^^

If the number of changed lines is greater than &report, then a message is written on the command-line. In a script, to prevent a message from being displayed, :silent must be invoked:

silent %s/pat/rep/g
^^^^^^

And the e flag must be used to avoid E486 when the pattern is not found:

silent %s/pat/rep/ge
                   ^

If the user has set 'gdefault', then the meaning of the g flag is reversed.

# 'gdefault' is *not* set
$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'set nogdefault' +'%s/pat/rep/g' +'%p|qa!'
rep rep PAT
rep rep PAT
rep rep PAT

# 'gdefault' *is* set
$ vim -es -Nu NONE -i NONE +"pu=repeat(['pat pat PAT'], 3)" +'set gdefault' +'%s/pat/rep/g' +'%p|qa!'
rep pat PAT
rep pat PAT
rep pat PAT

In a script, to be sure all matches are replaced, the value of the option must be inspected:

execute '%s/pat/rep/'..(&gdefault ? '' : 'g')

All in all, if you combine everything, the interactive command:

%s/pat/rep/g

Becomes this in a script which tries to have as fewer side effects as possible and be reliable:

let view = winsaveview()
silent execute 'keepjumps keeppatterns lockmarks %s/pat/rep/e'..(&gdefault ? '' : 'g')
call winrestview(view)

Describe the solution you'd like

A buf_substitute() function which would take 5 arguments:

{lnum}, {end}, {pat}, {sub}, {flags}

The {pat}, {sub}, {flags} arguments would be interpreted as in substitute(), while the {lnum} and {end} arguments would be interpreted as the first and last line of a range of lines in the buffer where the substitution should occur. It would have none of the side effects described earlier.

In a script, %s/pat/rep/g could then be re-written like this:

call buf_substitute(1, '$', 'pat', 'rep', 'g')

Describe alternatives you've considered

I've considered writing a custom function which would avoid all the side effects documented earlier, then use the latter whenever I need to do a substitution in a buffer. Somewhat similar to maktaba#buffer#Substitute().

But it would create a dependency in each plugin I would write.

Besides, it would create a difference between the context where the function is called, and the one where the substitution is executed. This would make it hard to refer to a script-local or function-local variable in the pattern or replacement arguments, if in the end the substitution is not executed in the context of the current script/function but in the context of another function defined in another script.


I've also considered using a combination of setline()+getline()+map()+substitute():

call getline(1, '$')->map('substitute(v:val, "pat", "rep", "g")')->setline(1)

But it's less readable than:

call buf_substitute(1, '$', 'pat', 'rep', 'g')

And it's much slower than :s:

mv /tmp/version8.txt{,bak}; vim -Nu NONE +'h version8' +'saveas /tmp/version8.txt|qa!'; for i in {1..10}; do vim -es -Nu NONE -i NONE +"let time = reltime()|%s/pat/rep/g|0pu=matchstr(reltimestr(reltime(time)), '.*\..\{,3}').' seconds to run :s'" +'1p|qa!' /tmp/version8.txt; done

0.010 seconds to run :s
0.009 seconds to run :s
0.009 seconds to run :s
0.012 seconds to run :s
0.009 seconds to run :s
0.008 seconds to run :s
0.008 seconds to run :s
0.008 seconds to run :s
0.008 seconds to run :s
0.011 seconds to run :s

# average: 0.009 seconds

mv /tmp/version8.txt{,bak}; vim -Nu NONE +'h version8' +'saveas /tmp/version8.txt|qa!'; for i in {1..10}; do vim -es -Nu NONE -i NONE +"let time = reltime()|call getline(1, '$')->map('substitute(v:val, \"pat\", \"rep\", \"g\")')->setline(1)|0pu=matchstr(reltimestr(reltime(time)), '.*\..\{,3}').' seconds to run setline()->...'" +'1p|qa!' /tmp/version8.txt; done

0.129 seconds to run setline()->...
0.123 seconds to run setline()->...
0.123 seconds to run setline()->...
0.128 seconds to run setline()->...
0.130 seconds to run setline()->...
0.162 seconds to run setline()->...
0.154 seconds to run setline()->...
0.161 seconds to run setline()->...
0.128 seconds to run setline()->...
0.123 seconds to run setline()->...

# average: 0.136 seconds

That's 15 times slower.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

bfrg

unread,
Feb 12, 2020, 8:45:25 PM2/12/20
to vim/vim, Subscribed

Another problem with using setline() is that it deletes all text-properties in that line. :substitute on the other hand will readjust the text-properties.

There's already setbufline(), getbufline(), appendbufline() and deletebufline().
Wouldn't substitutebufline() or just subbufline() be better?

lacygoill

unread,
Feb 13, 2020, 4:56:30 AM2/13/20
to vim/vim, Subscribed

Ah yes you're right, setline() does not preserve text properties.

I guess you could install a callback with listener_add() to update the properties, but I don't know much about it, and it may be too costly if the substitution is made often and/or on a big buffer.


And yes, the name subbufline() is more consistent with the existing buffer functions; I updated the OP to use it instead of buf_substitute() (which I originally used to make a parallel with substitute(), like win_execute() and execute()).

Bram Moolenaar

unread,
Feb 13, 2020, 2:43:19 PM2/13/20
to vim/vim, Subscribed

To fit in with getbufline() and setbufline() the first argument should specify the buffer.
Then we have six arguments, that's a bit much. Might still be the best way.
subbufline({buf}, {start}, {end}, {pat}, {sub}, {flags})

Perhaps we can use substitute() but make the first argument a list to specify the lines?
It won't return the result then, that is inconsistent.

lacygoill

unread,
Apr 19, 2021, 7:47:06 PM4/19/21
to vim/vim, Subscribed

subbufline() would be even more useful now in Vim9.

Starting from 8.2.2784, the replacement field of a substitution command is compiled when it's an expression evaluated with the \= syntax.

So, this works now:

vim9script
def Replace()
    setline(1, 'aaa')
    var rep = 'bbb'
    s/aaa/\=rep/
enddef
Replace()

Just like it did in Vim script legacy:

fu Replace()
    call setline(1, 'aaa')
    let rep = 'bbb'
    s/aaa/\=rep/
endfu
call Replace()

But if some other part of our substitution command needs to be dynamic, like the range or the pattern, then we need :exe:

fu Replace()
    call setline(1, 'aaa')
    let pat = 'aaa'
    let rep = 'bbb'
    exe 's/' .. pat .. '/\=rep/'
endfu
call Replace()

But this doesn't work in Vim9:

vim9script
def Replace()
    setline(1, 'aaa')
    var pat = 'aaa'
    var rep = 'bbb'
    exe 's/' .. pat .. '/\=rep/'
enddef
Replace()
E121: Undefined variable: rep

That's because :exe suppresses the compilation of \=rep. :exe is compiled, but not the command it executes. So, when :exe is executed at runtime, Vim can't find rep on the stack.

With subbufline(), there would be no such issue, because we could use a simple lambda (which is compiled):

subbufline(buf, lnum1, lnum2, pat, (_) => rep, flags)
                                   ^--------^

And this would make the code much more readable than :exe:

  • :exe makes us lose syntax highlighting in the literal parts of the command
  • :exe might require to nest a quote inside a string, which can be tricky
  • :exe makes it difficult to determine what's literal and what's evaluated (and when it's evaluated)

And it would work no matter which part of the command needs to be dynamic: range, pattern, replacement, flags...

lacygoill

unread,
Apr 19, 2021, 8:17:56 PM4/19/21
to vim/vim, Subscribed

Actually, there would be no need for a lambda in this simple case:

subbufline(buf, lnum1, lnum2, pat, rep, flags)

The lambda is for when we refer to a capturing group or the whole match:

subbufline(buf, lnum1, lnum2, pat, (m) => m[1], flags)
                                   ^---------^

Bram Moolenaar

unread,
Apr 20, 2021, 2:13:59 PM4/20/21
to vim/vim, Subscribed

have you tried using getline()/map()/setline() with a compiled argument to map()?

lacygoill

unread,
Apr 20, 2021, 6:56:23 PM4/20/21
to vim/vim, Subscribed

For :s:

:%s/pat/rep/g

The results are:

0.011 seconds to run :s
0.010 seconds to run :s
0.011 seconds to run :s
0.010 seconds to run :s
0.010 seconds to run :s
0.009 seconds to run :s
0.010 seconds to run :s
0.009 seconds to run :s
0.010 seconds to run :s
0.009 seconds to run :s

Average: 10ms.

For map() with a reference to a :def function:

def Rep(_, v: string): string
  return substitute(v, 'pat', 'rep', 'g')
enddef
def Substitute()
  getline(1, '$')
    ->map(Rep)
    ->setline(1)
enddef

The results are:

0.158 seconds to run map()
0.157 seconds to run map()
0.152 seconds to run map()
0.156 seconds to run map()
0.159 seconds to run map()
0.155 seconds to run map()
0.155 seconds to run map()
0.155 seconds to run map()
0.156 seconds to run map()
0.160 seconds to run map()

Average: 156ms.

For map() with a lambda:

def Substitute()
  getline(1, '$')
    ->map((_, v) => substitute(v, 'pat', 'rep', 'g'))
    ->setline(1)
enddef

The results are:

0.125 seconds to run map()
0.133 seconds to run map()
0.131 seconds to run map()
0.125 seconds to run map()
0.124 seconds to run map()
0.125 seconds to run map()
0.128 seconds to run map()
0.126 seconds to run map()
0.132 seconds to run map()
0.131 seconds to run map()

Average: 128ms.

It's still 13 times slower.

I'm a bit surprised that a lambda is faster than a function reference, since both the lambda and the Rep() function are compiled. I guess there is a cost to look up a function by name.

Bram Moolenaar

unread,
Apr 21, 2021, 6:20:55 AM4/21/21
to vim/vim, Subscribed


[...]

> Test for `getline()` + `map()` + `substitute()` + `setline()`:
> ```bash
> #!/bin/bash
>
> cat <<'EOF' >/tmp/test.vim
> vim9script

> def Substitute()
> getline(1, '$')
> ->map((_, v) => substitute(v, 'pat', 'rep', 'g'))
> ->setline(1)
> enddef
> defcompile
> var time = reltime()
> Substitute()
> :0put =reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run map()'
> :1p
> qa!
> EOF
>
> vim -Nu NONE +'h version8 | saveas! /tmp/version8.txt | qa!'

> for i in {1..10}; do
> vim -es -N -u NONE -U NONE -i NONE -S /tmp/test.vim /tmp/version8.txt
> done
> ```
> Results:
>
> 0.188 seconds to run map()
> 0.188 seconds to run map()
> 0.187 seconds to run map()
> 0.190 seconds to run map()
> 0.188 seconds to run map()
> 0.186 seconds to run map()
> 0.186 seconds to run map()
> 0.185 seconds to run map()
> 0.188 seconds to run map()
> 0.186 seconds to run map()
>
> Average: 187ms.
>
> That's 11 times slower.

The main overhead here is likely that map() has to call the function for
every line. Try using a for loop and replacing one line at a time.
Sketch (didn't try it):

for lnum in range(1, line('$'))
getline(lnum)->subsitute('pat', 'rep', 'g')->setline(lnum)
endfor


--
Birthdays are healthy. The more you have them, the longer you live.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

lacygoill

unread,
Apr 21, 2021, 6:43:22 AM4/21/21
to vim/vim, Subscribed

Ah yes, I forgot that :for loops could be much faster than map(). Here, it's a bit faster (around 10%), but still much slower than :s:

#!/bin/bash

cat <<'EOF' >/tmp/test.vim
  vim9script
  def Substitute()
    for lnum in range(1, line('$'))
      getline(lnum)->substitute('pat', 'rep', 'g')->setline(lnum)
    endfor
  enddef
  defcompile
  var time = reltime()
  Substitute()
  :0put =reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Substitute()'
  :1p
  qa!
EOF

vim -Nu NONE +'h version8 | saveas! /tmp/version8.txt | qa!'
for i in {1..10}; do
  vim -es -N -u NONE -U NONE -i NONE -S /tmp/test.vim /tmp/version8.txt
done

Results:

0.112 seconds to run Substitute()
0.112 seconds to run Substitute()
0.121 seconds to run Substitute()
0.113 seconds to run Substitute()
0.111 seconds to run Substitute()
0.114 seconds to run Substitute()
0.114 seconds to run Substitute()
0.110 seconds to run Substitute()
0.112 seconds to run Substitute()
0.112 seconds to run Substitute()

Average: 113 ms.

Bram Moolenaar

unread,
Apr 21, 2021, 9:15:31 AM4/21/21
to vim/vim, Subscribed


> Ah yes, I forgot that `:for` loops could be much faster than `map()`. Here, it's a bit faster (around 10%), but still much slower than `:s`:
> ```bash

> #!/bin/bash
>
> cat <<'EOF' >/tmp/test.vim
> vim9script
> def Substitute()
> for lnum in range(1, line('$'))
> getline(lnum)->substitute('pat', 'rep', 'g')->setline(lnum)
> endfor
> enddef
> defcompile
> var time = reltime()
> Substitute()
> :0put =reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run Substitute()'
> :1p
> qa!
> EOF
>
> vim -Nu NONE +'h version8 | saveas! /tmp/version8.txt | qa!'

> for i in {1..10}; do
> vim -es -N -u NONE -U NONE -i NONE -S /tmp/test.vim /tmp/version8.txt
> done
> ```
> Results:
>
> 0.112 seconds to run Substitute()
> 0.112 seconds to run Substitute()
> 0.121 seconds to run Substitute()
> 0.113 seconds to run Substitute()
> 0.111 seconds to run Substitute()
> 0.114 seconds to run Substitute()
> 0.114 seconds to run Substitute()
> 0.110 seconds to run Substitute()
> 0.112 seconds to run Substitute()
> 0.112 seconds to run Substitute()
>
> Average: 113 ms.

That's disappointing. It requires doing profiling to see where most
time is spent. Could be compiling the regexp.

--
hundred-and-one symptoms of being an internet addict:
136. You decide to stay in a low-paying job teaching just for the
free Internet access.


/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

lacygoill

unread,
Apr 23, 2021, 5:57:35 AM4/23/21
to vim/vim, Subscribed

Here are the 10 most expensive function calls extracted from the full log:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 13.24      0.27     0.27        1     0.27     1.57  exec_instructions
  6.86      0.41     0.14        1     0.14     0.24  readfile
  5.88      0.53     0.12  1588710     0.00     0.00  nfa_regcomp
  5.88      0.65     0.12 64464905     0.00     0.00  utfc_ptr2len
  4.90      0.75     0.10  1588706     0.00     0.00  u_savecommon
  4.41      0.84     0.09  3177420     0.00     0.00  post2nfa
  3.43      0.91     0.07  5621920     0.00     0.00  vim_strchr
  2.94      0.97     0.06 23884934     0.00     0.00  lalloc
  2.94      1.03     0.06 14298376     0.00     0.00  clear_tv
  2.70      1.09     0.06  1588710     0.00     0.00  vim_regcomp

lacygoill

unread,
Apr 23, 2021, 5:57:35 AM4/23/21
to vim/vim, Subscribed

I did, and it's still much slower than a regular :s command.

Test for a :s command:

#!/bin/bash

cat <<'EOF' >/tmp/test.vim
  vim9script
  def Substitute()
    :%s/pat/rep/g
  enddef
  defcompile
  var time = reltime()
  Substitute()
  :0put =reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run :s'
  :1p
  qa!
EOF

vim -Nu NONE +'h version8 | saveas! /tmp/version8.txt | qa!'
for i in {1..10}; do
  vim -es -N -u NONE -U NONE -i NONE -S /tmp/test.vim /tmp/version8.txt
done

Results:

0.017 seconds to run :s
0.017 seconds to run :s
0.016 seconds to run :s
0.017 seconds to run :s
0.016 seconds to run :s
0.017 seconds to run :s
0.017 seconds to run :s
0.016 seconds to run :s
0.016 seconds to run :s
0.016 seconds to run :s

Average: 17ms.

Test for getline() + map() + substitute() + setline():

#!/bin/bash

cat <<'EOF' >/tmp/test.vim
  vim9script
  def Substitute()
    getline(1, '$')
      ->map((_, v) => substitute(v, 'pat', 'rep', 'g'))
      ->setline(1)
  enddef
  defcompile
  var time = reltime()
  Substitute()
  :0put =reltime(time)->reltimestr()->matchstr('.*\..\{,3}') .. ' seconds to run map()'
  :1p
  qa!
EOF

vim -Nu NONE +'h version8 | saveas! /tmp/version8.txt | qa!'
for i in {1..10}; do
  vim -es -N -u NONE -U NONE -i NONE -S /tmp/test.vim /tmp/version8.txt
done

Results:

0.188 seconds to run map()
0.188 seconds to run map()
0.187 seconds to run map()
0.190 seconds to run map()
0.188 seconds to run map()
0.186 seconds to run map()
0.186 seconds to run map()
0.185 seconds to run map()
0.188 seconds to run map()
0.186 seconds to run map()

Average: 187ms.

That's 11 times slower.

I still don't know how to profile the C code, but I tried to use gprof(1). Here's the reports I get:

Reply all
Reply to author
Forward
0 new messages